home *** CD-ROM | disk | FTP | other *** search
Text File | 1993-07-15 | 2.7 MB | 78,013 lines |
Text Truncated. Only the first 1MB is shown below. Download the file for the complete contents.
- IEEE P1003.2 Draft 11.2 - September 1991
-
-
- Copyright (c) 1991 by the
- Institute of Electrical and Electronics Engineers, Inc.
- 345 East 47th Street
- New York, NY 10017, USA
- All rights reserved as an unpublished work.
-
- This is an unapproved and unpublished IEEE Standards Draft,
- subject to change. The publication, distribution, or
- copying of this draft, as well as all derivative works based
- on this draft, is expressly prohibited except as set forth
- below.
-
- Permission is hereby granted for IEEE Standards Committee
- participants to reproduce this document for purposes of IEEE
- standardization activities only, and subject to the
- restrictions contained herein.
-
- Permission is hereby also granted for member bodies and
- technical committees of ISO and IEC to reproduce this
- document for purposes of developing a national position,
- subject to the restrictions contained herein.
-
- Permission is hereby also granted to the preceding entities
- to make limited copies of this document in an electronic
- form only for the stated activities.
-
- The following restrictions apply to reproducing or
- transmitting the document in any form: 1) all copies or
- portions thereof must identify the document's IEEE project
- number and draft number, and must be accompanied by this
- entire notice in a prominent location; 2) no portion of this
- document may be redistributed in any modified or abridged
- form without the prior approval of the IEEE Standards
- Department.
-
- Other entities seeking permission to reproduce this
- document, or any portion thereof, for standardization or
- other activities, must contact the IEEE Standards Department
- for the appropriate license.
-
- Use of information contained in this unapproved draft is at
- your own risk.
-
- IEEE Standards Department
- Copyright and Permissions
- 445 Hoes Lane, P.O. Box 1331
- Piscataway, NJ 08855-1331, USA
- +1 (908) 562-3800
- +1 (908) 562-1571 [FAX]
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- P1003.2 Draft 11.2
- ISO/IEC CD 9945-2.2
-
-
-
-
-
-
-
-
- STANDARDS PROJECT
-
- Draft Standard for Information Technology --
- Portable Operating System Interface (POSIX)
- Part 2:
- Shell and Utilities
-
-
- Sponsor
- Technical Committee on Operating Systems
- and Application Environments
- of the
- IEEE Computer Society
-
- Work Item Number: JTC 1.22.21.2
-
-
- Abstract: ISO/IEC 9945-2: 199x (IEEE Std 1003.2-199x) is part of the
- POSIX series of standards for applications and user interfaces to open
- systems. It defines the applications interface to a shell command
- language and a set of utility programs for complex data manipulation.
-
- Keywords: API, application portability, data processing, open systems,
- operating system, portable application, POSIX, shell and utilities
-
-
- P1003.2 / D11.2
- September 1991
-
-
- Copyright (c) 1991 by the
- Institute of Electrical and Electronics Engineers, Inc.
- 345 East 47th Street
- New York, NY 10017, USA
- All rights reserved.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- _T_h_i_s _i_s _a_n _u_n_a_p_p_r_o_v_e_d _I_E_E_E _S_t_a_n_d_a_r_d_s _D_r_a_f_t, _s_u_b_j_e_c_t _t_o _c_h_a_n_g_e. _P_e_r_m_i_s_s_i_o_n
- _i_s _h_e_r_e_b_y _g_r_a_n_t_e_d _f_o_r _I_E_E_E _S_t_a_n_d_a_r_d_s _C_o_m_m_i_t_t_e_e _p_a_r_t_i_c_i_p_a_n_t_s _t_o _r_e_p_r_o_d_u_c_e
- _t_h_i_s _d_o_c_u_m_e_n_t _f_o_r _p_u_r_p_o_s_e_s _o_f _I_E_E_E _s_t_a_n_d_a_r_d_i_z_a_t_i_o_n _a_c_t_i_v_i_t_i_e_s. _P_e_r_m_i_s_s_i_o_n
- _i_s _a_l_s_o _g_r_a_n_t_e_d _f_o_r _m_e_m_b_e_r _b_o_d_i_e_s _a_n_d _t_e_c_h_n_i_c_a_l _c_o_m_m_i_t_t_e_e_s _o_f _I_S_O _a_n_d _I_E_C
- _t_o _r_e_p_r_o_d_u_c_e _t_h_i_s _d_o_c_u_m_e_n_t _f_o_r _p_u_r_p_o_s_e_s _o_f _d_e_v_e_l_o_p_i_n_g _a _n_a_t_i_o_n_a_l _p_o_s_i_t_i_o_n.
- _O_t_h_e_r _e_n_t_i_t_i_e_s _s_e_e_k_i_n_g _p_e_r_m_i_s_s_i_o_n _t_o _r_e_p_r_o_d_u_c_e _t_h_i_s _d_o_c_u_m_e_n_t _f_o_r
- _s_t_a_n_d_a_r_d_i_z_a_t_i_o_n _o_r _o_t_h_e_r _a_c_t_i_v_i_t_i_e_s, _o_r _t_o _r_e_p_r_o_d_u_c_e _p_o_r_t_i_o_n_s _o_f _t_h_i_s
- _d_o_c_u_m_e_n_t _f_o_r _t_h_e_s_e _o_r _o_t_h_e_r _u_s_e_s, _m_u_s_t _c_o_n_t_a_c_t _t_h_e _I_E_E_E _S_t_a_n_d_a_r_d_s
- _D_e_p_a_r_t_m_e_n_t _f_o_r _t_h_e _a_p_p_r_o_p_r_i_a_t_e _l_i_c_e_n_s_e. _U_s_e _o_f _i_n_f_o_r_m_a_t_i_o_n _c_o_n_t_a_i_n_e_d _i_n
- _t_h_i_s _u_n_a_p_p_r_o_v_e_d _d_r_a_f_t _i_s _a_t _y_o_u_r _o_w_n _r_i_s_k.
-
- IEEE Standards Department
- Copyright and Permissions
- 445 Hoes Lane, P.O. Box 1331
- Piscataway, NJ 08855-1331, USA
- +1 (908) 562-3800
- +1 (908) 562-1571 [FAX]
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- _S_e_p_t_e_m_b_e_r _1_9_9_1 _S_H _X_X_X_X_X
-
- BEGIN_RATIONALE
-
- _E_d_i_t_o_r'_s _N_o_t_e_s
-
- The IEEE ballot for Draft 11.2 is due at the IEEE Standards Office on 2
- _2222_1111 _OOOO_cccc_tttt_oooo_bbbb_eeee_rrrr _1111_9999_9999_1111. You are also asked to e-mail any balloting comments to 2
- me: hlj@posix.com. Please read the balloting instructions in Annex G. 2
-
- This document is also registered as ISO/IEC CD 9945-2.2. The 2
- international balloting period is unrelated to the IEEE balloting. 2
- Member bodies, please consult any accompanying materials from SC22. 2
- Also, please read the remainder of these Editor Notes to see explanations 2
- of stylistic differences between a draft and the final standard 2
- (copyright notices, inline rationale, etc.). 2
-
- The IEEE balloting will be on hiatus during the international balloting 2
- period, which is probably scheduled to complete at the May 1992 WG15 2
- meeting. This is in accordance with the WG15 Synchronization Plan, which 2
- calls for coordinated balloting to result in the approval of an IEEE/ANSI 2
- standard that is identical to the ISO/IEC Draft International Standard 2
- (DIS). There will be a final recirculation of a full draft (12) to the 2
- IEEE balloting group before it is sent to the Standards Board. 2
-
- This section will not appear in the final document. It is used for 2
- editorial comments concerning this draft. Draft 11.2 is the fifth 2
- recirculation of the balloting process that began in December 1988 with 2
- Draft 8. Please consult Annex G and the cover letter for the ballot that
- accompanied this draft for information on how the recirculation is
- accomplished.
-
- This draft uses small numbers in the right margin in lieu of change bars. 2
- ``2'' denotes changes from Draft 11.1 to Draft 11.2. ``1'' denotes 2
- changes from Draft 11 to Draft 11.1. All diff-marks prior to Draft 11.1 1
- have been removed. Trivial informative (i.e., non-normative) changes and
- purely editorial changes such as grammar, spelling, or cross references
- are not diff-marked.
-
- There are two versions of Draft 11.2 in circulation. The full printed 2
- version was sent for SC22 balloting and is also available from the IEEE 2
- for a duplication fee [call (800) 678-IEEE or +1 (908) 981-1393 outside 2
- the US]. The version sent to the IEEE balloting group consists (mostly) 1
- of pages containing normative changes. This was done to focus balloting 1
- group attention on the changes being balloted and to reduce costs and 1
- administrative time. The changes-only version contains a few handwritten 1
- pointers in the margins to show context where it would not be obvious; 1
- numbers near the normal page numbers show what the corresponding Draft 11 1
- page number would be. 1
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
-
-
-
-
-
-
-
-
- The following minor global changes have been made without diff-marks:
-
- - Instances of the verbs ``print,'' ``report,'' ``display,''
- ``issue,'' and ``list'' are being changed to ``write'' as part of a
- general cleanup related to the UPE, where ``write'' and ``display''
- have precise meanings. This is probably not completed and will
- continue throughout ballot resolution and the final editing
- process.
-
- ISO and IEEE have tightened up the requirements for the use of ``shall.''
- We have been directed that all sentences that are currently declarative
- must be changed to use the ``shall'' form if they pose a requirement:
- ``The status is zero'' -> ``The status shall be zero.'' One specific
- instance of this was changing ``The following options/operands are
- available'' to ``The following options/operands shall be supported by the
- implementation.'' Another: ``The foo utility follows the utility
- argument syntax standard described in 2.11.2'' to ``The foo utility shall
- conform to the utility argument syntax guidelines described in 2.10.2.''
- It is a tedious process to do all these translations and they are not
- complete. They will completed on a draft-by-draft basis. In the
- meantime, please assume that all declarative sentences mean to use
- ``shall'' and treat them as either implementation or application
- requirements unless they specifically say ``may,'' ``should,'' or
- ``can.''
-
- The rationale text for all the sections has been temporarily moved from
- Annex E and interspersed with the appropriate sections. The rationale
- sections are identified with the phrase ``(_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t
- _o_f _P_1_0_0_3._2)'' in the heading. This colocation of rationale with its
- accompanying text was done to encourage the Technical Reviewers to
- maintain the rationale text, as well as provide explanations to the
- reviewers and balloters. Not all of the Rationale sections have contents
- as of this draft. The empty sections may be partially distracting, but
- we feel it is imperative to keep them there to encourage the Technical
- Reviewers to provide rationale as needed.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
-
-
-
-
-
-
-
-
- Please report typographical errors to:
-
- Hal Jespersen
- POSIX Software Group
- 447 Lakeview Way
- Redwood City, CA 94062
- +1 (415) 364-3410
- FAX: +1 (415) 364-4498
- Email: hlj@Posix.COM
-
- (_E_l_e_c_t_r_o_n_i_c _m_a_i_l _i_s _p_r_e_f_e_r_r_e_d.)
-
- The copying and distribution of IEEE balloting drafts is accomplished by
- the Standards Office. To report problems with reproduction of your copy, 2
- contact: 2
-
- Anna Kaczmarek 2
- IEEE Standards Office
- P.O. Box 1331
- 445 Hoes Lane
- Piscataway, NJ 08855-1331
-
- +1 (908) 562-3811 2
- FAX: +1 (908) 562-1571
-
- Additional copies of this draft are available for a duplication and 2
- mailing fee. Contact: 2
-
- IEEE Publications 2
- 1 (800) 678-IEEE 2
- +1 (908) 981-1393 [outside US] 2
-
- This draft is available in various electronic forms to assist the review 2
- process. Our thanks to Andrew Hume of AT&T Bell Laboratories for 2
- providing online access facilities. Note that this is a limited 2
- experiment in providing online access; future ballots may provide other 2
- forms, such as diskettes or a bulletin board arrangement, but the 2
- instructions shown here are the only methods currently available. Please 2
- also observe the additional copyright restrictions that are described in 2
- the online files. 2
-
- Assuming you have access to the Internet, the scenario is approximately 2
-
-
-
-
-
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
-
-
-
-
-
-
-
-
- ftp research.att.com # research's IP address is 192.20.225.2 2
- <login as netlib; password is your email address> 2
- cd posix/p1003.2/d11.2 2
- get toc index 2
- binary 2
- get p11-20.Z 2
-
- The draft is available in several forms. The table of contents can be 2
- found in toc, pages containing a particular section are stored under the 2
- section number, sets of pages are stored in files with names of the form 2
- p_n-_m, and the entire draft is stored in all. By default, files are 2
- ASCII. A .ps suffix indicates PostScript. A .Z suffix indicates a 2
- compress'_e_d file. The file index contains a general description of the 2
- files available. 2
-
- These files are also available via electronic mail by sending a message 2
- like 2
-
- send 3.4 3.5 9.2 from posix/p1003.2/d11.2 2
-
- to netlib@research.att.com. If you use email, you should _n_o_t ask for the 2
- compressed version. For a more complete introduction to this form of 2
- _n_e_t_l_i_b, send the message 2
-
- send help 2
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
-
-
-
-
-
-
-
-
- _P_O_S_I_X._2 _C_h_a_n_g_e _H_i_s_t_o_r_y
-
- This section is provided to track major changes between drafts. Since it
- was first added in Draft 11, earlier entries omit some degree of detail.
-
- Draft 11.2 [September 1991] Sixth IEEE ballot (fifth recirculation; 2
- only changed pages distributed). Second ISO/IEC CD 9945-2 2
- registration (full draft distributed). 2
-
- - Equivalence classes as starting/ending points of 2
- regular expression bracket expression range expression 2
- have been made unspecified. 2
-
- - The LC_COLLATE substitute keyword has been deleted. 2
-
- - cksum (4.9): Modifications to the algorithm. 2
-
- - cp (4.13): Restoration of the 2
-
- - stty (4.59): Addition of the tostop operand. 2
-
- - lex (A.2): Further clarification of ERE differences. 2
-
- - Miscellaneous clarifications to various utilities. 2
-
- Draft 11.1 [June 1991] Fifth IEEE ballot (fourth recirculation; only 1
- changed pages distributed). 1
-
- - Modification of the definition of _b_y_t_e and 1
- clarifications of octal/hexadecimal byte 1
- representations throughout the utilities. 1
-
- - Clarifications to the locale definition source file 1
- description in 2.5; addition of a yacc grammar. 1
-
- - Removal of pax -e character translation option. 1
-
- - Miscellaneous clarifications to various utilities. 1
-
- - Reconciliation of feature test macros and headers in 1
- Annex B with POSIX.1. 1
-
- Draft 11 [February 1991] Fourth IEEE ballot (third recirculation).
-
- - Changes in 2.3 to the treatment of regular built-ins in
- regards to their _e_x_e_c-able versions.
-
- - Changes to 2.4 (character names and charmap syntax) and
- 2.5 (localedef input format) as a result of
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
-
-
-
-
-
-
-
-
- international balloting. Addition of the
- {POSIX2_LOCALEDEF} symbol.
-
- - Changes to the shell quoting rules, arithmetic
- expression syntax, command search order, error
- descriptions, and exportable functions.
-
- - Movement of the command utility from special built-in
- status to be a utility in Section 4.
-
- - cp (4.13): Significant clarifications and interface
- changes.
-
- - date (4.15): Added field descriptor modifiers to
- handle alternate calendar forms when supported by the
- locale and implementation.
-
- - pax (4.48): Significant interface changes, including
- international character set translations.
-
- - test (4.62): Deprecated some functionality due to
- inconsistent behavior in existing implementations that
- cause portability problems in existing applications.
-
- - make (6.2): Addition of the .POSIX special target,
- return of some rules to strict existing practice.
-
- - Miscellaneous clarifications to various utilities.
-
- - The FORTRAN section now has two options associated with
- it: Development Utilities (fort77) and Runtime
- Utilities (asa).
-
- - Addition of full example profiles and charmaps from
- Denmark in Annex F.
-
- Draft 10 [July 1990] Third IEEE ballot (second recirculation).
-
- - This draft primarily has been one of clarification and
- amplification. In resolving ballot objections, large
- portions of the draft have been rewritten, affecting
- all sections, but comparatively few changes in
- [intended] functionality have occurred.
-
- - New shell command language features (see Section 3):
-
- - Utility name changes:
-
- Draft 9 Draft 10
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
-
-
-
-
-
-
-
-
- _______ ________
- create pathchk
- hexdump od
- sendto mailx
-
- - A few of the utilities and global sections now have a
- more formal description, using a yacc-like grammar.
-
- - Considerably more detail has been added to the
- internationalization features of the standard: global
- changes to clauses 2.4 and 2.5; new detail to the LC_*
- variables in each utility section; specification of
- LC_MESSAGES (replacing LC_RESPONSE).
-
- - Due to some ISO requirements, Sections 1 and 2 have
- been reorganized yet again, causing many cross
- reference number changes. The Related Standards annex
- has been turned into simply a Bibliography. The Non-
- Specified Language Compilers annex has been replaced by
- a Sample National Profile annex.
-
- Draft 9 [August 1989] Second IEEE ballot (first recirculation).
- Also registered as ISO/IEC CD 9945-2.1. A few minor
- corrections to some sections. :-)
-
- Draft 8 [December 1988] First IEEE ballot. Also submitted to
- ISO/IEC JTC 1/SC22 for review and comment.
-
- Draft 7 [September 1988] ``Mock ballot'' conducted by working
- group members only.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
-
-
-
-
-
-
-
-
- _P_O_S_I_X._2 _T_e_c_h_n_i_c_a_l _R_e_v_i_e_w_e_r_s
-
- The individuals denoted in Table i are the Technical Reviewers for this
- draft. During balloting they are the subject matter experts who
- coordinate the resolution process for specific sections, as shown.
-
- Table i - POSIX.2 Technical Reviewers
-
- __________________________________________________________________________________________________________________________________________________
- Section Description Reviewer
- ___________________________________________________________________
-
- 1 _G_e_n_e_r_a_l Jespersen
- 2.4,2.5 _D_e_f_i_n_i_t_i_o_n_s (_L_o_c_a_l_e_s) Leijonhufvud 1
- 2 (rest) _D_e_f_i_n_i_t_i_o_n_s (_V_a_r_i_o_u_s) Jespersen
- 3 _C_o_m_m_a_n_d _L_a_n_g_u_a_g_e Jespersen
- 4 _E_x_e_c_u_t_i_o_n _E_n_v_i_r_o_n_m_e_n_t _U_t_i_l_i_t_i_e_s: _c_p, rm Bostic 22
- 4 _E_x_e_c_u_t_i_o_n _E_n_v_i_r_o_n_m_e_n_t _U_t_i_l_i_t_i_e_s: (_t_h_e Jespersen 22
- _r_e_s_t) 2
- 6 _S_o_f_t_w_a_r_e _D_e_v_e_l_o_p_m_e_n_t _U_t_i_l_i_t_i_e_s Jespersen
- 7 _L_a_n_g_u_a_g_e-_I_n_d_e_p_e_n_d_e_n_t _B_i_n_d_i_n_g_s Jespersen 2
- A _C _D_e_v_e_l_o_p_m_e_n_t _U_t_i_l_i_t_i_e_s Jespersen
- B _C _B_i_n_d_i_n_g_s Jespersen 2
- C _F_O_R_T_R_A_N _D_e_v_e_l_o_p_m_e_n_t _a_n_d _R_u_n_t_i_m_e _U_t_i_l_i_t_i_e_s Jespersen
- D-G _V_a_r_i_o_u_s Jespersen
- __________________________________________________________________________________________________________________________________________________
-
-
- Also, our special thanks to Donn Terry for writing or improving all the
- yacc-based grammars used in Draft 10.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
-
-
-
-
-
-
-
-
- _P_O_S_I_X._2 _P_r_o_p_o_s_e_d _S_c_h_e_d_u_l_e
-
- This section will not appear in the final document. It is used to
- provide editorial notes regarding the proposed POSIX.2 schedule. In the
- schedule, the UPE stands for ``User Portability Extension.''
-
-
- _____________________________________________________________________
- | Date | Milestone (End of Meeting) | Draft |
- _|_______________________|______________________________________|_______|
- |Sep 7-11, 1987 | Utility format frozen; | 3 |
- |Nashua, NH | 10% of utilities described. | |
- _|_______________________|______________________________________|_______|
- |Dec 7-14, 87 | 50% of utilities described; | 4 |
- |San Diego, CA | shell update; substantial | |
- _|_______________________|_p_r_o_g_r_e_s_s__i_n__S_e_c_t_i_o_n_s__2_,__3_,__4_,__8_.______|_______|
- |Mar 14-18, 1988 | Utility selection frozen; | 5 |
- |Washington, DC | 75% described. | |
- _|_______________________|______________________________________|_______|
- |Jul 11-15, 1988 | 100% utilities described; | 6 |
- |Denver, CO | functional freeze; produce ``mock | |
- _|_______________________|_b_a_l_l_o_t_'_'__a_n_d__P_O_S_I_X__F_I_P_S__d_r_a_f_t__7_______|_______|
- |[Sep-Oct 1988] | [Mock ballot] | 7 |
- _|_______________________|______________________________________|_______|
- |Oct 24-28, 1988 | Resolve mock ballot objections; | 7 |
- |Honolulu, HI | produce first real ballot (draft 8) | |
- _|_______________________|_U_P_E__p_l_a_n_n_i_n_g__b_e_g_i_n_s___________________|_______|
- |[Jan-Feb 1989] | [First ballot] | 8 |
- _|_______________________|______________________________________|_______|
- |Jan 9-11, 1989 | Begin UPE definitions; | 8 |
- |Ft. Lauderdale, FL | Technical Reviewer coordination | |
- _|_______________________|_o_f__f_i_r_s_t__b_a_l_l_o_t__r_e_s_p_o_n_s_e_s_____________|_______|
- |[Feb-Apr 1989] | [Ballot resolution] | 8 |
- _|_______________________|______________________________________|_______|
- |Apr 24-28, 1989 | Working Group concurrence with | 9 |
- |Minneapolis, MN | ballot resolution; produce Draft 9 | |
- _|_______________________|_f_o_r__r_e_c_i_r_c_u_l_a_t_i_o_n_;__U_P_E__w_o_r_k___________|_______|
- |Jul 10-14, 1989 | UPE work | |
- |San Jose, CA | | |
- _|_______________________|______________________________________|_______|
- _|[_O_c_t__1_9_8_9_]______________|_[_F_i_r_s_t__R_e_c_i_r_c_u_l_a_t_i_o_n_]_________________|___9____|
- |[Nov-Feb 1990] | [Ballot resolution] | 9 |
- _|_______________________|______________________________________|_______|
- _|[_A_u_g_-_S_e_p__1_9_9_0_]__________|_[_S_e_c_o_n_d__R_e_c_i_r_c_u_l_a_t_i_o_n_]________________|__1_0____|
- |[Mar 1991] | [Third Recirculation] | 11 |
- _|_______________________|______________________________________|_______|
- _|[_J_u_n__1_9_9_1_]______________|_[_F_o_u_r_t_h__R_e_c_i_r_c_u_l_a_t_i_o_n_]________________|_1_1_._1___| 11
- _|_______________________|______________________________________|_______| 11111
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
-
-
-
-
-
-
-
-
- |[Sep 1991] | [Fifth Recirculation] | 11.2 | 1
- _|_______________________|______________________________________|_______| 1
- _|[_m_i_d_-_1_9_9_2_]______________|_[_I_E_E_E__S_t_a_n_d_a_r_d__B_o_a_r_d__A_p_p_r_o_v_e_s_?_?_]______|__1_2____| 21
- |[Jul 1990 - Apr 1992] | [Ballot .2a UPE supplement] | | 1
- _|_______________________|______________________________________|_______|
-
- END_RATIONALE
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- IEEE Standards documents are developed within the Technical Committees of
- the IEEE Societies and the Standards Coordinating Committees of the IEEE
- Standards Board. Members of the committees serve voluntarily and without
- compensation. They are not necessarily members of the Institute. The
- standards developed within IEEE represent a consensus of the broad
- expertise on the subject within the Institute as well as those activities
- outside of IEEE that have expressed an interest in participating in the
- development of the standard.
-
- Use of an IEEE Standard is wholly voluntary. The existence of an IEEE
- Standard does not imply that there are no other ways to produce, test,
- measure, purchase, market, or provide other goods and services related to
- the scope of the IEEE Standard. Furthermore, the viewpoint expressed at
- the time a standard is approved and issued is subject to change brought
- about through developments in the state of the art and comments received
- from users of the standard. Every IEEE Standard is subjected to review
- at least every five years for revision or reaffirmation. When a document
- is more than five years old and has not been reaffirmed, it is reasonable
- to conclude that its contents, although still of some value, do not
- wholly reflect the present state of the art. Users are cautioned to
- check to determine that they have the latest edition of any IEEE
- Standard.
-
- Comments for revision of IEEE Standards are welcome from any interested
- party, regardless of membership affiliation with IEEE. Suggestions for
- changes in documents should be in the form of a proposed change of text,
- together with appropriate supporting comments.
-
- Interpretations: Occasionally questions may arise regarding the meaning
- of portions of standards as they relate to specific applications. When
- the need for interpretations is brought to the attention of the IEEE, the
- Institute will initiate action to prepare appropriate responses. Since
- IEEE Standards represent a consensus of all concerned interests, it is
- important to ensure that any interpretation has also received the
- concurrence of a balance of interests. For this reason, the IEEE and the
- members of its technical committees are not able to provide an instant
- response to interpretation requests except in those cases where the
- matter has previously received formal consideration.
-
- Comments on standards and requests for interpretations should be
- addressed to:
-
- Secretary, IEEE Standards Board
- 445 Hoes Lane
- P.O. Box 1331
- Piscataway, NJ 08855-1331
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
-
-
-
-
-
-
-
-
- __________________________________________________________________
- |IEEE Standards documents are adopted by the Institute of |
- |Electrical and Electronics Engineers without regard |
- |to whether their adoption may involve patents |
- |on articles, materials, or processes. |
- |Such adoption does not assume any liability to any patent owner, |
- |nor does it assume any obligation whatever to parties adopting |
- _||t_h_e__s_t_a_n_d_a_r_d_s__d_o_c_u_m_e_n_t_s_.__________________________________________||
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Contents
-
-
- PAGE
-
- Introduction....................................................... ii
- Organization of the Standard.................................... ii
- Base Documents.................................................. ii
- Related Standards Activities.................................... ii
-
- Section 1: General................................................. 1
- 1.1 Scope..................................................... 1
- 1.2 Normative References...................................... 13
- 1.3 Conformance............................................... 14
-
- Section 2: Terminology and General Requirements.................... 21
- 2.1 Conventions............................................... 21
- 2.2 Definitions............................................... 26
- 2.3 Built-in Utilities........................................ 58
- 2.4 Character Set............................................. 61
- 2.5 Locale.................................................... 69
- 2.6 Environment Variables..................................... 119
- 2.7 Required Files............................................ 126
- 2.8 Regular Expression Notation............................... 128
- 2.9 Dependencies on Other Standards........................... 161
- 2.10 Utility Conventions....................................... 172
- 2.11 Utility Description Defaults.............................. 182
- 2.12 File Format Notation...................................... 198
- 2.13 Configuration Values...................................... 204
-
- Section 3: Shell Command Language.................................. 215
- 3.1 Shell Definitions......................................... 217
- 3.2 Quoting................................................... 220
- 3.3 Token Recognition......................................... 224
- 3.4 Reserved Words............................................ 226
- 3.5 Parameters and Variables.................................. 228
- 3.6 Word Expansions........................................... 233
- 3.7 Redirection............................................... 249
- 3.8 Exit Status and Errors.................................... 255
- 3.9 Shell Commands............................................ 258
- 3.10 Shell Grammar............................................. 279
- 3.11 Signals and Error Handling................................ 288
- 3.12 Shell Execution Environment............................... 289
- 3.13 Pattern Matching Notation................................. 291
- 3.14 Special Built-in Utilities................................ 295
-
- Section 4: Execution Environment Utilities......................... 317
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- ii
-
-
-
-
-
-
-
- PAGE
-
- 4.1 awk - Pattern scanning and processing language............ 317
- 4.2 basename - Return nondirectory portion of pathname........ 358
- 4.3 bc - Arbitrary-precision arithmetic language.............. 362
- 4.4 cat - Concatenate and print files......................... 383
- 4.5 cd - Change working directory............................. 388
- 4.6 chgrp - Change file group ownership....................... 392
- 4.7 chmod - Change file modes................................. 395
- 4.8 chown - Change file ownership............................. 405
- 4.9 cksum - Write file checksums and sizes.................... 409
- 4.10 cmp - Compare two files................................... 416
- 4.11 comm - Select or reject lines common to two files......... 420
- 4.12 command - Execute a simple command........................ 424
- 4.13 cp - Copy files........................................... 430
- 4.14 cut - Cut out selected fields of each line of a file...... 440
- 4.15 date - Write the date and time............................ 445
- 4.16 dd - Convert and copy a file.............................. 452
- 4.17 diff - Compare two files.................................. 462
- 4.18 dirname - Return directory portion of pathname............ 471
- 4.19 echo - Write arguments to standard output................. 475
- 4.20 ed - Edit text............................................ 479
- 4.21 env - Set environment for command invocation.............. 498
- 4.22 expr - Evaluate arguments as an expression................ 503
- 4.23 false - Return false value................................ 509
- 4.24 find - Find files......................................... 511
- 4.25 fold - Fold lines......................................... 521
- 4.26 getconf - Get configuration values........................ 526
- 4.27 getopts - Parse utility options........................... 531
- 4.28 grep - File pattern searcher.............................. 537
- 4.29 head - Copy the first part of files....................... 545
- 4.30 id - Return user identity................................. 549
- 4.31 join - Relational database operator....................... 554
- 4.32 kill - Terminate or signal processes...................... 559
- 4.33 ln - Link files........................................... 566
- 4.34 locale - Get locale-specific information.................. 570
- 4.35 localedef - Define locale environment..................... 577
- 4.36 logger - Log messages..................................... 583
- 4.37 logname - Return user's login name........................ 586
- 4.38 lp - Send files to a printer.............................. 589
- 4.39 ls - List directory contents.............................. 595
- 4.40 mailx - Process messages.................................. 605
- 4.41 mkdir - Make directories.................................. 610
- 4.42 mkfifo - Make FIFO special files.......................... 614
- 4.43 mv - Move files........................................... 617
- 4.44 nohup - Invoke a utility immune to hangups................ 623
- 4.45 od - Dump files in various formats........................ 627
- 4.46 paste - Merge corresponding or subsequent lines of
- files..................................................... 637
- 4.47 pathchk - Check pathnames................................. 642
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- iii
-
-
-
-
-
-
-
- PAGE
-
- 4.48 pax - Portable archive interchange........................ 648
- 4.49 pr - Print files.......................................... 665
- 4.50 printf - Write formatted output........................... 672
- 4.51 pwd - Return working directory name....................... 679
- 4.52 read - Read a line from standard input.................... 682
- 4.53 rm - Remove directory entries............................. 686
- 4.54 rmdir - Remove directories................................ 692
- 4.55 sed - Stream editor....................................... 695
- 4.56 sh - Shell, the standard command language interpreter..... 706
- 4.57 sleep - Suspend execution for an interval................. 713
- 4.58 sort - Sort, merge, or sequence check text files.......... 716
- 4.59 stty - Set the options for a terminal..................... 725
- 4.60 tail - Copy the last part of a file....................... 736
- 4.61 tee - Duplicate standard input............................ 742
- 4.62 test - Evaluate expression................................ 745
- 4.63 touch - Change file access and modification times......... 756
- 4.64 tr - Translate characters................................. 762
- 4.65 true - Return true value.................................. 770
- 4.66 tty - Return user's terminal name......................... 772
- 4.67 umask - Get or set the file mode creation mask............ 775
- 4.68 uname - Return system name................................ 780
- 4.69 uniq - Report or filter out repeated lines in a file...... 784
- 4.70 wait - Await process completion........................... 790
- 4.71 wc - Word, line, and byte count........................... 795
- 4.72 xargs - Construct argument list(s) and invoke utility..... 799
-
- Section 5: User Portability Utilities Option....................... 807
-
- Section 6: Software Development Utilities Option................... 809
- 6.1 ar - Create and maintain library archives................. 809
- 6.2 make - Maintain, update, and regenerate groups of
- programs.................................................. 818
- 6.3 strip - Remove unnecessary information from executable
- files..................................................... 844
-
- Section 7: Language-Independent System Services.................... 847
- 7.1 Shell Command Interface................................... 848
- 7.2 Access Environment Variables.............................. 849
- 7.3 Regular Expression Matching............................... 849
- 7.4 Pattern Matching.......................................... 850
- 7.5 Command Option Parsing.................................... 850
- 7.6 Generate Pathnames Matching a Pattern..................... 850
- 7.7 Perform Word Expansions................................... 851
- 7.8 Get POSIX Configurable Variables.......................... 851
- 7.9 Locale Control............................................ 852
-
- Annex A (normative) C Language Development Utilities Option........ 855
- A.1 c89 - Compile Standard C programs......................... 856
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- iv
-
-
-
-
-
-
-
- PAGE
-
- A.2 lex - Generate programs for lexical tasks................. 867
- A.3 yacc - Yet another compiler compiler...................... 884
-
- Annex B (normative) C Language Bindings Option..................... 907
- B.1 C Language Definitions.................................... 908
- B.1.1 POSIX Symbols...................................... 908
- B.1.2 Headers and Function Prototypes.................... 910
- B.1.3 Error Numbers...................................... 911
- B.2 C Numerical Limits........................................ 911
- B.2.1 C Macros for Symbolic Limits....................... 912
- B.2.2 Compile-Time Symbolic Constants for Portability
- Specifications..................................... 913
- B.2.3 Execution-Time Symbolic Constants for Portability
- Specifications..................................... 914
- B.2.4 POSIX.1 C Numerical Limits......................... 915
- B.3 C Binding for Shell Command Interface..................... 915
- B.3.1 C Binding for Execute Command...................... 916
- B.3.2 C Binding for Pipe Communications with Programs.... 919
- B.4 C Binding for Access Environment Variables................ 925
- B.5 C Binding for Regular Expression Matching................. 925
- B.6 C Binding for Match Filename or Pathname.................. 934
- B.7 C Binding for Command Option Parsing...................... 937
- B.8 C Binding for Generate Pathnames Matching a Pattern....... 942
- B.9 C Binding for Perform Word Expansions..................... 948
- B.10 C Binding for Get POSIX Configurable Variables............ 954
- B.11 C Binding for Locale Control.............................. 957
-
- Annex C (normative) FORTRAN Development and Runtime Utilities
- Options......................................................... 959
- C.1 asa - Interpret carriage-control characters............... 960
- C.2 fort77 - FORTRAN compiler................................. 964
-
- Annex D (informative) Bibliography................................. 973
-
- Annex E (informative) Rationale and Notes.......................... 977
- E.1 General................................................... 977
- E.2 Terminology and General Requirements...................... 978
- E.3 Shell Command Language.................................... 979
- E.4 Execution Environment Utilities........................... 980
- E.5 User Portability Utilities Option......................... 993
- E.6 Software Development Utilities Option..................... 993
- E.7 Language-Independent System Services...................... 994
- E.8 C Language Development Utilities Option................... 994
- E.9 C Language Bindings Option................................ 995
- E.10 FORTRAN Development and Runtime Utilities Options......... 996
-
- Annex F (informative) Sample National Profile...................... 997
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- v
-
-
-
-
-
-
-
- PAGE
-
- Annex G (informative) Balloting Instructions....................... 1091
-
- Identifier Index................................................... 1105
-
- Alphabetic Topical Index........................................... 1111
-
-
- FIGURES
-
- Figure B-1 - Sample _ssss_yyyy_ssss_tttt_eeee_mmmm() Implementation....................... 922
- Figure B-2 - Sample _pppp_cccc_llll_oooo_ssss_eeee() Implementation....................... 926
- Figure B-3 - Example Regular Expression Matching.................. 933
- Figure B-4 - Argument Processing with _gggg_eeee_tttt_oooo_pppp_tttt().................... 942
-
-
- TABLES
-
- Table 2-1 - Typographical Conventions............................. 22
- Table 2-2 - Regular Built-in Utilities............................ 58
- Table 2-3 - Character Set and Symbolic Names...................... 62
- Table 2-4 - Control Character Set................................. 63
- Table 2-5 - LC_CTYPE Category Definition in the POSIX Locale...... 76
- Table 2-6 - Valid Character Class Combinations.................... 81
- Table 2-7 - LC_COLLATE Category Definition in the POSIX Locale.... 84
- Table 2-8 - LC_MONETARY Category Definition in the POSIX Locale... 96
- Table 2-9 - LC_NUMERIC Category Definition in the POSIX Locale.... 101
- Table 2-10 - LC_TIME Category Definition in the POSIX Locale...... 102
- Table 2-11 - LC_MESSAGES Category Definition in the POSIX Locale.. 106
- Table 2-12 - BRE Precedence....................................... 136
- Table 2-13 - ERE Precedence....................................... 139
- Table 2-14 - C Standard Operators and Functions................... 171
- Table 2-15 - Escape Sequences..................................... 199
- Table 2-16 - Utility Limit Minimum Values......................... 205
- Table 2-17 - Symbolic Utility Limits.............................. 206
- Table 2-18 - Optional Facility Configuration Values............... 212
- Table 4-1 - awk Expressions in Decreasing Precedence.............. 322
- Table 4-2 - awk Escape Sequences.................................. 347
- Table 4-3 - bc Operators.......................................... 370
- Table 4-4 - ASCII to EBCDIC Conversion............................ 459
- Table 4-5 - ASCII to IBM EBCDIC Conversion........................ 460
- Table 4-6 - dirname Examples...................................... 474
- Table 4-7 - expr Expressions...................................... 505
- Table 4-8 - od Named Characters................................... 632
- Table 4-9 - stty Control Character Names.......................... 730
- Table 4-10 - stty Circumflex Control Characters................... 731
- Table 7-1 - POSIX.1 Numeric-Valued Configurable Variables......... 853
- Table A-1 - lex Table Size Declarations........................... 873
- Table A-2 - lex Escape Sequences.................................. 875
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- vi
-
-
-
-
-
-
-
-
-
- Table A-3 - lex ERE Precedence.................................... 877
- Table A-4 - yacc Internal Limits.................................. 903
- Table B-1 - POSIX.2 Reserved Header Symbols....................... 911
- Table B-2 - _POSIX_C_SOURCE....................................... 911
- Table B-3 - C Macros for Symbolic Limits.......................... 914
- Table B-4 - C Compile-Time Symbolic Constants..................... 916
- Table B-5 - C Execution-Time Symbolic Constants................... 916
- Table B-6 - Structure Type _rrrr_eeee_gggg_eeee_xxxx______tttt................................ 928
- Table B-7 - Structure Type _rrrr_eeee_gggg_mmmm_aaaa_tttt_cccc_hhhh______tttt............................. 928
- Table B-8 - _rrrr_eeee_gggg_cccc_oooo_mmmm_pppp() _cccc_ffff_llll_aaaa_gggg_ssss Argument............................. 928
- Table B-9 - _rrrr_eeee_gggg_eeee_xxxx_eeee_cccc() _eeee_ffff_llll_aaaa_gggg_ssss Argument............................. 928
- Table B-10 - _rrrr_eeee_gggg_cccc_oooo_mmmm_pppp(), _rrrr_eeee_gggg_eeee_xxxx_eeee_cccc() Return Values................... 932
- Table B-11 - _ffff_nnnn_mmmm_aaaa_tttt_cccc_hhhh() _ffff_llll_aaaa_gggg_ssss Argument............................. 937
- Table B-12 - Structure Type _gggg_llll_oooo_bbbb______tttt................................ 944
- Table B-13 - _gggg_llll_oooo_bbbb() _ffff_llll_aaaa_gggg_ssss Argument................................ 945
- Table B-14 - _gggg_llll_oooo_bbbb() Error Return Values........................... 947
- Table B-15 - Structure Type _wwww_oooo_rrrr_dddd_eeee_xxxx_pppp______tttt............................. 950
- Table B-16 - _wwww_oooo_rrrr_dddd_eeee_xxxx_pppp() _ffff_llll_aaaa_gggg_ssss Argument............................. 951
- Table B-17 - _wwww_oooo_rrrr_dddd_eeee_xxxx_pppp() Return Values.............................. 952
- Table B-18 - confstr() _nnnn_aaaa_mmmm_eeee Values................................ 955
- Table B-19 - C Bindings for Numeric-Valued Configurable
- Variables........................................................ 958
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- vii
-
-
-
-
-
-
-
-
-
- Introduction
-
-
-
- (This Introduction is not a normative part of P1003.2 Information
- technology -- Portable Operating System Interface (POSIX) -- Part 2:
- Shell and Utilities, but is included for information only.)
-
- The purpose of this standard is to define a standard interface and
- environment for application programs that require the services of a
- ``shell'' command language interpreter and a set of common utility
- programs. It is intended for systems implementors and application
- software developers, and is complementary to ISO/IEC 9945-1: 1990 {8}
- (first in a family of ``POSIX'' standards), which specifies operating
- system interfaces and source code level functions, based on the UNIX1)
- system documentation. This standard, or ``POSIX.2,'' is based upon
- documentation and the knowledge of existing programs that assume an
- interface and architecture similar to that described by POSIX.1. (See
- 1.1 for a full description of the relationship between the standards.)
-
- The majority of this standard describes the functions of utilities that
- can interface with application programs. The standard also provides
- high-level language interfaces that the application uses to access these
- utilities and other useful, related services. These language-independent
- service interfaces are temporarily described in terms of their C language
- bindings. The C language assumed is that defined by the C Standard:
- _A_N_S_I/_X_3._1_5_9-_1_9_8_9 _P_r_o_g_r_a_m_m_i_n_g _L_a_n_g_u_a_g_e _C _S_t_a_n_d_a_r_d produced by Technical
- Committee X3J11 of the Accredited Standards Committee X3 -- Information
- Processing Systems.
-
- Organization of the Standard
-
- The standard is divided into ten parts:
-
- - General, including a statement of scope, normative references, and
- conformance requirements. (Section 1).
-
- - Definitions, general requirements, and the environment available to
- applications. (Section 2).
-
-
-
-
- __________
- 1) UNIX is a registered trademark of UNIX System Laboratories in the USA
- and other countries.
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- viii Introduction
-
-
-
-
-
-
-
- - The shell command interpreter language. (Section 3).
-
- - Descriptions of the utilities in the required ``Execution
- Environment Utilities.'' (Section 4).
-
- - Descriptions of the utilities required for user portability on
- asynchronous terminals. (Section 5 [to be provided in a future
- revision]).
-
- - Descriptions of the utilities in the optional ``Software
- Development Utilities.'' (Section 6).
-
- - Language-independent interfaces for high-level programming language
- access to shell and related services. (Section 7).
-
- - Descriptions of the utilities in the optional ``C Language
- Development Utilities.'' (Normative Annex A).
-
- - C language bindings to the interfaces in Section 6. (Normative
- Annex B).
-
- - Descriptions of the utilities in the optional ``FORTRAN Development
- and Runtime Utilities.'' (Normative Annex C).
-
- This introduction, the foreword, any footnotes, NOTES accompanying the
- text, and the _i_n_f_o_r_m_a_t_i_v_e annexes are not considered part of the
- standard. Annexes D through G are informative.
-
- Base Documents
-
- Many of the interfaces and utilities of this standard were adapted from
- materials in machine-readable forms donated by the following
- organizations:
-
- - AT&T: the _S_y_s_t_e_m _V _I_n_t_e_r_f_a_c_e _D_e_f_i_n_i_t_i_o_n (_S_V_I_D) {B24},2) Issue 2,
- Volume 2. Copyright c 1986, AT&T; reprinted with permission.
-
- - The X/Open Company, Ltd.: the _X/_O_p_e_n _P_o_r_t_a_b_i_l_i_t_y _G_u_i_d_e {B30}
- {B31}, Issues II and III, Volume 1. Copyright c 1989, X/Open
- Company, Ltd; reprinted with permission.
-
-
-
-
- __________
- 2) The number in braces corresponds to those of the references in 1.2
- (or the bibliographic entry in Annex D if the number is preceded by
- the letter B).
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- ix
-
-
-
-
-
-
-
- - University of California, _T_h_e _U_N_I_X _U_s_e_r'_s _R_e_f_e_r_e_n_c_e _M_a_n_u_a_l {B28},
- 4.3 Berkeley Software Distribution, Virtual VAX-11 Version, 1986.
- Copyright c 1980, 1983, The Regents of the University of
- California; reprinted with permission.3)
-
- Significant reference use was also made of the following books:
-
- - Bolsky, Morris I., Korn, David G., _T_h_e _K_o_r_n_S_h_e_l_l _C_o_m_m_a_n_d _a_n_d
- _P_r_o_g_r_a_m_m_i_n_g _L_a_n_g_u_a_g_e {B25}, Prentice Hall, Englewood Cliffs, New
- Jersey (1988).
-
- - Aho, Alfred V., Kernighan, Brian W., Weinberger, Peter J., _T_h_e _A_W_K
- _P_r_o_g_r_a_m_m_i_n_g _L_a_n_g_u_a_g_e {B21}, Addison-Wesley, Reading, Massachusetts
- (1988).
-
- Many other proposals for functions and utilities were received from the
- various working group members, who are listed in the Acknowledgements
- section of this standard.
-
- Related Standards Activities
-
- Activities to extend this standard to address additional requirements are
- in progress, and similar efforts can be anticipated in the future.
-
- The following areas are under active consideration at this time, or are
- expected to become active in the near future:4)
-
- (1) Language-independent service descriptions of POSIX.1 {8}
-
- (2) C, Ada, and FORTRAN Language bindings to (1)
-
- (3) Verification testing methods
-
- (4) Realtime facilities
-
-
-
-
- __________
- 3) The IEEE is grateful to AT&T, UniForum, and the Regents of the
- University of California for permission to use their machine-readable
- materials.
- 4) A _S_t_a_n_d_a_r_d_s _S_t_a_t_u_s _R_e_p_o_r_t that lists all current IEEE Computer
- Society standards projects is available from the IEEE Computer
- Society, 1730 Massachusetts Avenue NW, Washington, DC 20036-1903;
- Telephone: +1 202 371-0101; FAX: +1 202 728-9614. Working drafts of
- POSIX standards under development are also available from this
- office.
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- x Introduction
-
-
-
-
-
-
-
- (5) Secure/Trusted System considerations
-
- (6) Network interface facilities
-
- (7) System Administration
-
- (8) Graphical User Interfaces
-
- (9) Profiles describing application- or user-specific combinations
- of Open Systems standards for: supercomputing, multiprocessor,
- and batch extensions; transaction processing; realtime systems;
- and multiuser systems based on historical models
-
- (10) An overall guide to POSIX-based or related Open Systems
- standards and profiles
-
- Extensions are approved as ``amendments'' or ``revisions'' to this
- document, following the IEEE and ISO/IEC Procedures.
-
- Approved amendments are published separately until the full document is
- reprinted and such amendments are incorporated in their proper positions.
-
- If you have interest in participating in the TCOS working groups
- addressing these issues, please send your name, address, and phone number
- to the Secretary, IEEE Standards Board, Institute of Electrical and
- Electronics Engineers, Inc., P.O. Box 1331, 445 Hoes Lane, Piscataway, NJ
- 08855-1331, and ask to have this forwarded to the chairperson of the
- appropriate TCOS working group. If you have interest in participating in
- this work at the international level, contact your ISO/IEC national body.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- Related Standards Activities xi
-
-
-
-
-
-
-
- P1003.2 was prepared by the 1003.2 working group, sponsored by the
- Technical Committee on Operating Systems and Application Environments of
- the IEEE Computer Society. At the time this standard was approved, the
- membership of the 1003.2 working group was as follows:
-
- Technical Committee on Operating Systems
- and Application Environments (TCOS)
-
- Chair: Jehan-Franc,ois Pa^ris
-
- TCOS Standards Subcommittee
-
- Chair: Jim Isaak
- Vice Chairs: Ralph Barker
- David Dodge
- Robert Bismuth
- Hal Jespersen
- Lorraine Kevra
- Treasurer: Quin Hahn
- Secretary: Shane McCarron
-
- 1003.2 Working Group Officials
-
- Chair: Hal Jespersen
- Vice Chair: Donald W. Cragun
- Editors: Hal Jespersen (1986, 1988-1991)
- Maggie Lee (1987-1988)
- Secretaries: Helene Armitage (1988-1990)
- Dave Grindeland (1991)
- Robert J. Makowski (1987-1988)
-
- Technical Reviewers
-
- Helene Armitage Ken Faubel Gary Miller
- Keith Bostic Greger Leijonhufvud Marc Teitelbaum
- John Caywood Bob Lenk Donn Terry
- Donald Cragun Mark Levine Teoman Topcubasi
- David Decot Shane McCarron David Willcox
-
- Working Group
-
- Helene Armitage Quin Hahn Jim Oldroyd
- Brian Baird Michael J. Hannah Mark Parenti
- John R. Barr Marjorie E. Harris John Peace
- Philippe Bertrand David F. Hinnant Jon Penner
- Robert Bismuth Leon M. Holmes Gerald Powell
- Jim Blondeau Ron Holt John Quarterman
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- xii Introduction
-
-
-
-
-
-
-
- James C. Bohem Randall Howard Joe Ramus
- Kathy Bohrer Steven A. James Mike Ressler
- Keith Bostic Steve Jennings Grover Righter
- Phyllis Eve Bregman Hal Jespersen Andrew K. Roach
- Peter Brouwer Ronald S. Karr Marco P. Roodzant
- F. Lee Brown, Jr. Lorraine C. Kevra Seth Rosenthal
- Jonathan Brown Martin Kirk Maude Sawyer
- James A. Capps Brad Kline Norman K. Scherer
- Bill Carpenter Hiromichi Kogure Glen Seeds
- Steve Carter David Korn Jim Selkaitis
- John Caywood Rick Kuhn Karen Sheaffer
- Bob Claeson Mike Lambert Del Shoemaker
- Mark Colburn Maggie Lee James Soddy
- Donald W. Cragun Perry Lee Daniel Steinberg
- Dave Decot Greger Leijonhufvud Scott A. Sutter
- Terence S. Dowling Bob Lenk Ravi Tavakley
- Stephen Dum Mark Levine Marc Teitelbaum
- Dominic Dunlop Gary Lindgren Donn Terry
- Mike Edmonds John Lomas Jack Thompson
- Ron Elliott Craig Lund Teoman Topcubasi
- Richard W. Elwood Rod MacDonald Eugene Tsuno
- Hirsaki Eto Dan Magenheimer Geraldine Vitovitch
- Fran Fadden Robert J. Makowski Carl vonLoewenfeldt
- Ken Faubel Shane P. McCarron Mike Wallace
- Martin C. Fong Jim McGinness Alan Weaver
- Terance Fong John McGrory Larry Wehr
- Glenn Fowler Stuart McKaig Bruce Weiner
- Gary A. Gaudet Sunil Mehta N. Ray Wilkes
- Al Gettier Bill Middlecamp David Willcox
- Timothy D. Gill Gary W. Miller Neil Winton
- Gregory Goddard Jim Moe David Woodend
- Loretta Goudie Yasushi Nakahara Morten With
- Dave Grindeland Martha Nalebuff Ken Witte
- John Lawrence Gregg Sonya D. Neufer John Wu
- Jerry Gross Landon Noll Peggy Younger
- Douglas A. Gwyn Robin T. O'Neill Hilary Zaloom
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- Related Standards Activities xiii
-
-
-
-
-
-
-
- The following persons were members of the 1003.2 Balloting Group that
- approved the standard for submission to the IEEE Standards Board:
-
- Derek Kaufman _X/_O_p_e_n _I_n_s_t_i_t_u_t_i_o_n_a_l _R_e_p_r_e_s_e_n_t_a_t_i_v_e
- Shane McCarron _U_N_I_X _I_n_t_e_r_n_a_t_i_o_n_a_l _I_n_s_t_i_t_u_t_i_o_n_a_l _R_e_p_r_e_s_e_n_t_a_t_i_v_e
- Peter Collinson _U_S_E_N_I_X _A_s_s_o_c_i_a_t_i_o_n _I_n_s_t_i_t_u_t_i_o_n_a_l _R_e_p_r_e_s_e_n_t_a_t_i_v_e
-
- Scott Anderson Carol J. Harkness Jim R. Oldroyd
- Helene Armitage Craig Harmer Craig Partridge
- David Athersych Dale Harris Rob Peglar
- Geoff Baldwin Myron Hecht John C. Penney
- Jerome E. Banasik Morris J. Herbert Rand S. Phares
- Steven E. Barber David F. Hinnant P. J. Plauger
- Robert M. Barned Lee A. Hollaar Gerald Powell
- David R. Bernstein Ronald Holt Jr. Scott E. Preece
- Kabekode V. S. Bhat Randall Howard James M. Purtilo
- Robert Bismuth Jim Isaak J. S. Quarterman
- Jim Blondran Richard James Wendy Rauch-Hindin
- Robert Borochoff Hal Jespersen Brad Rhoades
- Keith Bostic Greg Jones Christopher J. Riddick
- James P. Bound Michael J. Karels Andrew K. Roach
- Joseph Boykin Lorraine C. Kevra Arnold Robbins
- Kevin Brady Alan W. Kiecker R. Hughes Rowlands
- Phyllis Eve Bregman Jeff Kimmel Robert Sarr
- A. Winsor Brown M. J. Kirk Norman Schneidewind
- F. Lee Brown Jr. Kenneth C. Klingman Wolfgang Schwabl
- Luis-Felipe Cabrera Joshua W. Knight Richard Scott
- Nicholas A. Camillone David Korn Glen Seeds
- Andres Caravallo Takahiko Kuki Dan Shia
- Steven L. Carter Robin B. Lake Roger Shimada
- John Caywood Mike Lambert Mukesh Singhal
- Kilnam Chon Doris Lebovits Richard Sniderman
- Chan F. Chong Maggie Lee Steven Sommars
- Robert L. Claeson Greger Leijonhufvud Bryan W. Sparks
- Mark Colburn Robert M. Lenk Richard Stallman
- Kenneth N. Cole David Lennert Daniel Steinberg
- Richard Cornelius Mark E. Levine Douglas H. Steves
- William M. Corwin Kevin Lewis Peter Sugar
- Mike R. Cossey Kin F. Li Scott A. Sutter
- William Cox James P. Lonjers Ravi Tavakley
- Donald W. Cragun Joseph F. P. Luhukay Donn Terry
- Terence Dowling Paul Lustgarten Gary F. Tom
- Stephen A. Dum Ron Mabe A. T. Twigger
- John D. Earls Robert J. Makowski Mark-Rene Uchida
- Ron Elliott Roger J. Martin L. David Umbaugh
- Richard W. Elwood Joberto S. B. Martins Michael W. Vannier
- David Emery Yoshihiro Matsumoto M. B. Wagner
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- xiv Introduction
-
-
-
-
-
-
-
- Philip H. Enslow Shane McCarron John W. Walz
- Ken Faubel Martin J. McGowan III Alan G. Weaver
- Terence Fong Marshall Kirk McKusick Larry Wehr
- Ed Frankenberry Robert W. McWhirter Bruce Weiner
- John A. Gertwagen Doug Michels Brian Weis
- Al Gettier Gary W. Miller Peter J. Weyman
- Michel Gien James M. Moe Andrew E. Wheeler
- Gregory W. Goddard J. W. Moore David Willcox
- Robert C. Groman Anita Mundkur Jeff Wubik
- Judy Guist Martha Nalebuff Oren Yuen
- Gregory Guthrie Fred Noz Jason Zions
- Michael J. Hannah Alan F. Nugent
-
- When the IEEE Standards Board approved this standard on <_d_a_t_e _t_o _b_e
- _p_r_o_v_i_d_e_d>, it had the following membership:
-
-
-
-
-
-
-
- (to be pasted in by IEEE)
-
-
-
-
-
-
-
- END_RATIONALE
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- Related Standards Activities xv
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- P1003.2/D11.2
-
-
-
-
-
-
-
-
-
-
-
-
- Information technology -- Portable Operating System Interface (POSIX) --
- Part 2: Shell and Utilities
-
-
-
-
-
-
-
-
- Section 1: General
-
-
-
- 1.1 Scope
-
- This standard defines a standard source code level interface to command
- interpretation, or ``shell,'' services and common utility programs for
- application programs. These services and programs are complementary to
- those specified by ISO/IEC 9945-1: 1990 {8}, hereinafter referred to as
- ``POSIX.1 {8}.''
-
- The standard has been designed to be used by both application programmers
- and system implementors. However, it is intended to be a reference
- document and not a tutorial on the use of the services, the utilities, or
- the interrelationships between the utilities.
-
- The emphasis of this standard is on the shell and utility functionality
- required by application programs (including ``shell scripts'') and not on
- the direct interactive use of the shell command language or the utilities
- by humans.
-
- Portions of this standard comprise optional language bindings to system
- service interfaces. See, for example, the C Language Bindings Option in
- Annex B. This standard is intended to describe language interfaces and
- utilities in sufficient detail so that an application developer can
- understand the required interfaces without access to the source code of
- existing implementations on which they may be based. Therefore, it does
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 1.1 Scope 1
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- not attempt to describe the source programming language or internal
- design of the utilities; they should be considered ``black boxes'' that
- exhibit the described functionality.
-
- For language interfaces, or functions, this standard has been defined
- exclusively at the source code level. The objective is that a conforming
- portable application source program can be translated to execute on a
- conforming implementation. The standard assumes that the source program
- may need to be retranslated to produce target code for a new environment
- prior to execution in that environment.
-
- There is no requirement that the base operating system supporting the
- shell and utilities be one that fully conforms to ISO/IEC 9945-1: 1990
- {8}. (The base system could contain a subset of POSIX.1 {8}
- functionality, enough to support the requirements for this standard, as
- described in 2.9.1, but that could not claim full conformance to all of
- POSIX.1 {8}.) Furthermore, there is no requirement that the shell
- command interpreter or any of the standard utilities be written as
- POSIX.1 {8} conforming programs, or be written in any particular
- language.
-
- Although not requiring a fully conforming POSIX.1 {8} base, this standard
- is based upon documentation and the knowledge of existing programs that
- assume an interface and architecture similar to that described by
- POSIX.1 {8}. Any questions regarding the definition of terms or the
- semantics of an underlying concept should be referred to POSIX.1 {8}.
-
- BEGIN_RATIONALE
-
-
- 1.1.1 Scope Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2)
-
- This standard is one of a family of related standards. The term POSIX is
- correctly used to describe this family, and not only its foundation, the
- operating system interfaces of POSIX.1 {8}. Therefore, POSIX.2 could
- colloquially be described as the ``POSIX Shell and Tools Standard.''
-
- The interfaces documented for this standard are to and from high-level
- language application programs and to and from the utilities themselves;
- the standard does not directly address the interface with users.
-
- The ``source code'' interface to the command interpreter is defined in
- terms of high-level language functions in 7.1.1 or 7.1.2 (such as
- _s_y_s_t_e_m(), B.3.1, or _p_o_p_e_n(), B.3.2). There are also other function
- interfaces, such as those for matching regular expressions in 7.3
- (_r_e_g_c_o_m_p() in B.5). Many of the utilities in this standard, and the
- shell itself, also accept their own command languages or complex
- directives as input data, which is also referred to as source code. This
- data, an ordered series of characters, may be stored in files, or
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 2 1 General
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- ``scripts,'' that are portable between systems without true
- recompilation. However, just as with POSIX.1 {8}, the standard addresses
- only the issue of source code portability between systems; applications
- using these calls may have to be recompiled or translated when moving
- from one system to another.
-
- There has been considerable debate concerning the appropriate scope of
- the work represented by this standard. The following are rational
- alternatives that have been evaluated:
-
- (1) Define the shell and tools as extensions to POSIX.1 {8}. This
- would require a full conforming POSIX.1 {8} system as a base for
- the new facilities described here. Vocal proponents for this
- view have been the members of the POSIX.3 working group, who
- foresaw difficulties in producing a verification suite standard
- without having a known operating system base.
-
- (2) Decouple the shell and tools entirely from POSIX.1 {8}. This
- would potentially allow the standard to be implemented on such
- popular operating systems as MVS/TSO, VM/CMS, MS/DOS, VMS, etc.
- Those systems would not have to provide every minor detail of
- the POSIX.1 {8} language interfaces to conform under this model-
- --only enough to support the shell and tools.
-
- (3) Compromise between options 1 and 2. Base the standard on an
- interface _s_i_m_i_l_a_r to POSIX.1 {8}, but don't require full
- conformance. A simple example would be a Version 7 UNIX System,
- which could not conform to POSIX.1 {8} without considerable
- modification. However, a vendor could support all of the
- features of this standard without changing its kernel or binary
- compatibility. Another example would be a system that conformed
- to all stated POSIX.1 {8} interfaces, but that didn't have a
- fully conforming C Standard {7} compiler. The difficulty with
- this option is that it makes the stated goal of the working
- group a bit fuzzier and increases the amount of analysis
- required for the features included.
-
- The working group selected option 3 as its goal. It chose to retain the
- full UNIX system-like orientation, but did not wish to arbitrarily
- deprive legitimate systems that could _a_l_m_o_s_t conform. No useful feature
- of shells or commonly-used utilities were discarded to accommodate
- nonconforming base systems; on the other hand, no deliberate obstacles
- were arbitrarily erected. Furthermore, POSIX.1 {8} is still required for
- its definitions and architectural concepts, which are purposely not
- repeated in this standard.
-
- One concrete example of how the two standards interrelate is in the usage
- of POSIX.1 {8} function names in the descriptions of utilities in
- POSIX.2. There are a number of historical commands that directly mapped
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 1.1 Scope 3
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- into one of the UNIX system calls. For example: chmod and _c_h_m_o_d(); ln
- and _l_i_n_k(). The POSIX.2 working group was faced with the problem of
- having to define all of the complex interactions ``behind the scenes''
- for some simple commands. Creating a file, for example, involves many
- POSIX.1 {8} concepts, including processes, user IDs, multiple group
- permissions (which are optional), error conditions, etc. Rather than
- enumerating all of these interactions in many places, the POSIX.2 group
- chose to employ the POSIX.1 {8} function descriptions, where appropriate.
- See the chmod utility in 4.7 as an example. The utility description
- includes the phrase:
-
- ... performing actions equivalent to the _c_h_m_o_d() function as
- defined in the POSIX.1 {8} _c_h_m_o_d() function:
-
- This means that the POSIX.2 implementor has to read the POSIX.1 {8}
- _c_h_m_o_d() description and fully understand all of its functionality,
- requirements, and side effects, which now don't have to be repeated here.
- (Admittedly, this makes the POSIX.2 standard a bit more difficult to
- read, but the working group felt that precision transcended the need for
- readable or semi-tutorial documents.)
-
- The Introduction states that one of the goals of the working group was:
- ``This interface should be implementable on conforming POSIX.1 {8}
- systems.'' This implies that the working group has attempted to ensure
- that no additional functionality or extension is required to implement
- this standard on the base defined by POSIX.1 {8}. This is not to say
- that extensions are not allowed, but that they should not be necessary.
- The goal ``(7) Utilities and standards for the installation of
- applications" was once interpreted to mean that an elaborate series of
- tools was required to install and remove applications, based on complex
- description files and system databases of capabilities. An attempt to
- provide this was rejected by the balloting group and that type of system
- is now being evaluated by the POSIX.7 System Administration group.
- However, the original goal remains in the list, because many of the
- standard utilities are, in fact, targeted specifically for application
- installation--make, c89, lex, etc.
-
-
- 1.1.1.1 Existing Practice. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2)
-
- The working group would have been very happy to develop a standard that
- allowed all historical implementations (i.e., those existing prior to the
- time of publication) to be fully conforming and all historical
- applications to be Strictly Conforming POSIX Shell Applications without
- requiring any changes. Some modifications will be required to reconcile
- the specific differences between historical implementations; there are
- many divergent versions of UNIX systems extant and applications have
- sometimes been written to take advantage of features (or bugs) on
- specific systems. Therefore, the working group established a set of
- goals to maximize the value of the standard it eventually produced.
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 4 1 General
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- These goals are enumerated in the following subclauses. They are listed
- in approximate priority sequence, where the first subclause is the most
- important portability goal.
-
- 1.1.1.1.1 Preserve Historical Applications
-
- The most important priority was to ensure that historical applications
- continued to operate on conforming implementations. This required the
- selection of many utilities and features from the most prevalent
- historical implementations. The working group is relying on the
- following factors:
-
- (1) Many inconsistent historical features will still be supported as
- _o_b_s_o_l_e_s_c_e_n_t.
-
- (2) Common features of System V and BSD will continue to be
- supported by their sponsors, even if they aren't included here
- (just as long as they are not prevented from existing).
-
- Therefore, the standard was written so that the large majority of well-
- written historical applications should continue to operate as Conforming
- POSIX Shell Applications Using Extensions.
-
- 1.1.1.1.2 Clean Up the Interfaces
-
- The working group chose to extend the benefits of historical UNIX systems
- by making limited improvements to the utility interfaces; numerous
- complaints have been heard over the years about the inconsistencies in
- the command line interface, which have allegedly made it harder for
- novice users. Given the constraints of Preserve Historical Applications,
- the working group has made the following general modifications:
-
- (1) Utilities have been extended to deal with differences in
- character sets, collating sequences, and some cultural aspects
- relating to the locale of the user. (Examples: new features in
- regular expressions; new formatting options in date; see 4.15.)
-
- (2) The utility syntax guidelines in 2.10.2 have been applied to
- almost all of the utilities to promote a consistent interface.
- The guidelines themselves have been loosened up a bit from their
- counterparts in the _S_V_I_D. In many cases historical utilities
- have not conformed with these guidelines (which were written
- considerably later than the utilities themselves). The older
- interfaces have been maintained in the standard as obsolescent
- features. (Examples: join, sort.) However, in some cases,
- such as dd and find, such major surgery was required that the
- working group decided to leave the historical interfaces as is.
- ``Fixing'' the interface would mean replacing the command, which
- would not help applications portability. So, fixing was limited
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 1.1 Scope 5
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- to relatively minor abuses of the new guidelines, where
- reasonable consistency could be achieved while still maintaining
- the general type of interface of the historical version.
-
- (3) Features that were not generally portable across machine
- architectures or systems have been removed or marked obsolescent
- and new, more portable interfaces have been introduced.
- (Examples: the octal number methods of describing file modes in
- chmod and other utilities have been marked obsolescent; the
- symbolic ``ugo'' method has been extended to other utilities,
- such as umask.)
-
- (4) Features that have proved to be popular in some specific UNIX
- system variants have been adopted. (Examples: diff -c, which
- originated in BSD systems, and the ``new'' awk, from System V.)
- Such features were selected given the requirements for balloting
- group consensus; the features had to be used widely enough to
- balance accusations of ``creeping featurism'' and violations of
- the UNIX system ``tools philosophy.''
-
- (5) Unreasonable inconsistencies between otherwise similar
- interfaces have been reconciled. (Example: methods of
- specifying the patterns to the three grep-_r_e_l_a_t_e_d utilities have
- been made more consistent in the standard's single grep.)
-
- (6) When irreconcilable differences arose between versions of
- historical utilities, new interfaces (utility names or syntax)
- were sometimes added in their places. The working group
- resisted the urge to deviate significantly from historical
- practice; the new interfaces are generally consistent with the
- philosophy of historical systems and represent comparable
- functionality to the interfaces being replaced. In some cases,
- System V and BSD had diverged (such as with echo and sum) so
- significantly that no compromises for a common interface were
- possible. In these cases, either the divergent features were
- omitted or an entirely new command name was selected (such as
- with printf and cksum).
-
- (7) Arbitrary limits to utility operations have been removed.
- (Example: some historical ed utilities have very limited
- capabilities for dealing with large files or long input lines.)
-
- (8) Arbitrary limitations on historical extensions have been
- eliminated. (Example: regular expressions have been described
- so that the popular \< ... \> extension is allowed.)
-
- (9) Input and output formats have been specified in more detail than
- historical implementations have required, allowing applications
- to more effectively operate in pipelines with these utilities.
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 6 1 General
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- (Example: comm.)
-
- Thus, in many cases the working group could be accused of ``violating
- Existing Practice,'' and in fact received some balloting objections to
- that effect from implementors (although rarely from users or application
- developers). The working group was sensitive to charges that it was
- engaged in arbitrary software engineering rather than merely codifying
- existing practice. When changes were made, they were always written to
- preserve historical applications, but to move new conforming applications
- into a more consistent, portable environment. This strategy obviously
- requires changes to historical implementations; the working group
- carefully evaluated each change, weighing the value to users against the
- one-time costs of adding the new interfaces (and of possibly breaking
- applications that took advantage of bugs), generally siding with the
- users when the costs to implementations and applications was not
- excessively high.
-
- In some cases, changes were reluctantly made that could conceivably break
- some historical applications; the working group allowed these only in the
- face of practices it considered rare or significantly misguided.
-
- 1.1.1.1.3 Allow Historical Conforming Applications
-
- It is likely that many historical shell scripts will be Strictly
- Conforming POSIX.2 Applications without requiring modifications.
- Developers have long been aware of the differences among the historical
- UNIX system variants and have avoided the nonportable aspects to increase
- the scope of their applications' marketplace. However, the previous goal
- of a consistent interface was considered to be quite important, so there
- will be modifications required to some applications if they wish to be
- maximally portable in the future.
-
- 1.1.1.1.4 Preserve Historical Implementations
-
- As explained in 1.1.1.1.2, the requirements for portability and a
- consistent interface have caused the working group to add new utilities
- and features. No historical implementations contained all of the
- attributes required by the working group. Therefore, this lowest
- priority goal fell victim to the preceding goals, and every known
- historical implementation will require some modifications to conform to
- this standard.
-
- The working group took care to ensure that the implementations could add
- the new or modified features without breaking the operation of existing
- applications. (Note that the standard utilities are not considered
- applications in this regard, but are part of the implementation. In
- fact, many or most of the utilities named by this standard will have to
- change to some extent.)
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 1.1 Scope 7
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- 1.1.1.2 Outside the Scope. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2)
-
- The following areas are outside the scope of this standard. This
- subclause explains more of the rationale behind the exclusions. (It
- should be noted that this is not an official list. It was not part of
- the Project Authorization Request submitted to the IEEE, but was devised
- as a guide to keep the working group discussions on track.)
-
- (1) _O_p_e_r_a_t_i_n_g _s_y_s_t_e_m _a_d_m_i_n_i_s_t_r_a_t_i_v_e _c_o_m_m_a_n_d_s (_p_r_i_v_i_l_e_g_e_d _p_r_o_c_e_s_s_e_s,
- _s_y_s_t_e_m _p_r_o_c_e_s_s_e_s, _d_a_e_m_o_n_s, _e_t_c.).
-
- The working group followed the lead of the POSIX.1 {8} group in
- this instance. Administrative commands were felt to be too
- implementation dependent and not useful for application
- portability. Subsequent to this decision, a separate POSIX.7
- working group was formed to deal with this area of ``operator
- portability.'' It is anticipated that utilities needed for
- system administration will be closely coordinated with the
- POSIX.2 working group.
-
- (2) _C_o_m_m_a_n_d_s _r_e_q_u_i_r_e_d _f_o_r _t_h_e _i_n_s_t_a_l_l_a_t_i_o_n, _c_o_n_f_i_g_u_r_a_t_i_o_n, _o_r
- _m_a_i_n_t_e_n_a_n_c_e _o_f _o_p_e_r_a_t_i_n_g _s_y_s_t_e_m_s _o_r _f_i_l_e _s_y_s_t_e_m_s.
-
- This area is similar to item (1). System installation is
- contrasted against the application installation portion of the
- Scope by its orientation to installing the operating system
- itself, versus application programs. The exclusion of operating
- system installation facilities should not be interpreted to mean
- that the application installation procedures _c_a_n_n_o_t be used for
- installing operating system components. The proposed interface
- for this area encountered stiff resistance from the balloting
- group in Draft 8 and was temporarily withdrawn. As described in
- Annex E.4, a decision of the balloting group is pending on
- whether to begin work on a supplement to this standard
- (POSIX.2b) for application installation.
-
- (3) _N_e_t_w_o_r_k_i_n_g _c_o_m_m_a_n_d_s.
-
- These were excluded because they are deeply involved with other
- standards making bodies and are probably too complicated. In
- this case, several working groups were formed within the POSIX
- family to deal with this. It is anticipated that utilities
- needed for networking, if any, will be closely coordinated with
- the POSIX.2 working group. (In early drafts of this standard,
- which predated the formation of the networking-specific POSIX
- working groups, the historical ``UNIX system to UNIX system copy
- [UUCP]'' programs and protocols were included. These
- descriptions have been removed in deference to a more
- appropriate working group.)
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 8 1 General
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- (4) _T_e_r_m_i_n_a_l _c_o_n_t_r_o_l _o_r _u_s_e_r-_i_n_t_e_r_f_a_c_e _p_r_o_g_r_a_m_s (_e._g., _v_i_s_u_a_l
- _s_h_e_l_l_s, _v_i_s_u_a_l _e_d_i_t_o_r_s, _w_i_n_d_o_w _m_a_n_a_g_e_r_s, _c_o_m_m_a_n_d _h_i_s_t_o_r_y
- _m_e_c_h_a_n_i_s_m_s, _e_t_c.).
-
- This is probably the most contentious exclusion. A common
- complaint about many UNIX systems is how they're not very ``user
- friendly.'' Some people have hoped that the interface to users
- could be standardized with mice, icon-based desktop metaphors,
- and so forth. This standard neatly sidesteps those concerns by
- reminding its audience that it is an application portability
- standard, and therefore has little relationship to the manner in
- which users manage their terminals.
-
- However, this guideline was not meant to apply to applications.
- It is perfectly reasonable for an application to assume it can
- have a user interacting with it. That is why such facilities as 1
- displaying strings (with printf) without <newline>_s, stty, and 1
- various prompting utilities are included in the standard.
-
- The interfaces in this standard are very oriented to command
- lines being issued by shell scripts, or through the _s_y_s_t_e_m() or
- _p_o_p_e_n() functions. Therefore, interactive text editors, pagers,
- and other user interface tools have been omitted for now.
- Alternatively, other standards bodies, such as X3H3.6 and the
- IEEE TCOS P1201 working group, are devising interfaces that
- could possibly be more useful and long-lived than any prescribed
- by POSIX.2.
-
- There is one area of this subject that will be addressed by
- POSIX.2. The scope of the working group has been expanded to
- include what is being termed the _U_s_e_r _P_o_r_t_a_b_i_l_i_t_y _E_x_t_e_n_s_i_o_n,
- POSIX.2a. This will be published as a supplement to this
- standard and have the goal of providing a portable environment
- for relatively expert time-sharing or software development
- users. It will not attempt to deal with mice or windows or
- other advanced interfaces at this time, but should cover many of
- the terminal-oriented utilities, such as a full-screen editor,
- currently avoided by this edition of POSIX.2.
-
- (5) _G_r_a_p_h_i_c_s _p_r_o_g_r_a_m_s _o_r _i_n_t_e_r_f_a_c_e_s.
-
- See the comments on user interface, above.
-
- (6) _T_e_x_t _f_o_r_m_a_t_t_i_n_g _p_r_o_g_r_a_m_s _o_r _l_a_n_g_u_a_g_e_s.
-
- The existing text formatting languages are generally too
- primitive in scope to satisfy many users, who have relied on a
- myriad of macro languages. There is an ISO standard text
- description language, SGML, but this has had insufficient
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 1.1 Scope 9
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- exposure to the UNIX system community for standardization as
- part of POSIX at this time.
-
- (7) _D_a_t_a_b_a_s_e _p_r_o_g_r_a_m_s _o_r _i_n_t_e_r_f_a_c_e_s (_e._g. _S_Q_L, _e_t_c.).
-
- These interfaces are the province of other standards bodies.
-
-
- 1.1.1.3 Language-Independent Descriptions. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t
- _o_f _P_1_0_0_3._2)
-
- The POSIX.1 {8} and POSIX.5 working groups are currently engaged in
- developing the model for language-independent descriptions of system
- services. When complete, it will allow the C language bias of the
- POSIX.1 {8} standard to be excised and C will take its place among other
- language bindings that interface with the core services descriptions.
- The POSIX.2 working group did not wish to duplicate effort, and has
- therefore waited until POSIX.1 {8} achieves progress in this area. Thus,
- like the first version of POSIX.1 {8}, the initial drafts of POSIX.2
- start life as a C-only standard, with language independence scheduled to
- be included in a later draft. Fortunately, this standard is
- substantially less involved with C than POSIX.1 {8} is. In fact, all of
- the C interfaces are entirely optional.
-
- 1.1.1.4 Base Documents. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2)
-
- The working group consulted a number of documents in the course of its
- deliberations, to select utilities and features. There were five primary
- documents that started off the process:
-
- (1) The _S_y_s_t_e_m _V _I_n_t_e_r_f_a_c_e _D_e_f_i_n_i_t_i_o_n (_S_V_I_D), Issue 2, Volume 2.
-
- (2) The _X/_O_p_e_n _P_o_r_t_a_b_i_l_i_t_y _G_u_i_d_e, (_X_P_G), Issues II and III, Volume
- 1.
-
- (3) _T_h_e _U_N_I_X _U_s_e_r'_s _R_e_f_e_r_e_n_c_e _M_a_n_u_a_l, 4.3 Berkeley Software
- Distribution, Virtual VAX-11 Version. (The printed
- documentation as well as the online versions provided with the
- BSD ``Tahoe'' and ``Reno'' distributions were considered as one
- base document for the POSIX.2 work.)
-
- (4) _T_h_e _K_o_r_n_S_h_e_l_l _C_o_m_m_a_n_d _a_n_d _P_r_o_g_r_a_m_m_i_n_g _L_a_n_g_u_a_g_e, by Bolsky and
- Korn.
-
- (5) _T_h_e _A_W_K _P_r_o_g_r_a_m_m_i_n_g _L_a_n_g_u_a_g_e, by Aho, Kernighan, and Weinberger.
-
- The _X_P_G was used most heavily in initial deliberations about which
- utilities and features to include. The X/Open companies had done a very
- thorough job in analyzing the _S_V_I_D and other standards to compile a list
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 10 1 General
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- of the most useful and portable utilities. They carefully marked many
- features that had portability problems and the working group avoided them
- for this standard.
-
- AT&T, X/Open, and Berkeley provided machine-readable documentation for
- the use of the working group. However, due to very substantial
- differences in formatting standards, there is little resemblance between
- some of the utilities described here and their cousins in the _S_V_I_D, _X_P_G,
- and BSD user manual. Nevertheless, early usage of these documents was an
- invaluable aid in the production of the standard and the POSIX.2 working
- group extends its sincere thanks to all three organizations for their
- generous cooperation.
-
- The biggest divergence in POSIX.2's documentation has been its philosophy
- of fully specifying interfaces. The _S_V_I_D and _X_P_G are oriented solely
- towards application portability. Implementors would have a difficult
- time writing some of these utilities from the descriptions alone. In
- fact, both documents freely rely on the potential implementors licensing
- the source code for the reference systems to complete the specification.
- The POSIX.2 standard, on the other hand, also has implementors in its
- audience and it strove to expand its descriptions wherever useful and
- feasible. For example, it makes use of BNF grammars to describe complex
- syntaxes. It attempts to describe the interactions between options,
- operands, and environment variables, where conflicts can exist. It also
- attempts to describe all of the useful utility input and output formats.
- The goal here was to allow application developers to write filters or
- other programs that could parse the output of any of these utilities or
- to provide meaningful input from their programs. To the working group's
- knowledge, this is a task never before attempted for the historical UNIX
- system commands-the source code was always so readily available to anyone
- who really needed to know this information.
-
- The two commercial books listed were used as reference materials in
- preparing information on the shell and the _a_w_k language that was more
- recent and complete than AT&T's or X/Open's documentation.
-
-
- 1.1.1.5 History. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2)
-
- The _1_9_8_4 /_u_s_r/_g_r_o_u_p _S_t_a_n_d_a_r_d was originally intended to include the shell
- and user level commands. However, the /usr/group (now known as
- ``UniForum'') Standards Committee was unable to begin this effort, due to
- the complexity of the system call and library functions that it
- eventually did publish.
-
- A shell was referred to in the _s_y_s_t_e_m() function defined by _A_N_S_I/_X_3._1_5_9-
- _1_9_8_9 _P_r_o_g_r_a_m_m_i_n_g _L_a_n_g_u_a_g_e _C _S_t_a_n_d_a_r_d, but no syntax for the shell command
- language was attempted.
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 1.1 Scope 11
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- As the first version of POSIX.1 {8} neared completion, it became apparent
- that the usefulness of POSIX would be diminished if no shell or utilities
- were defined. Therefore, the POSIX.2 working group was formed in January
- 1986 at the Denver, Colorado, meeting of POSIX.1 {8} to address this
- concern.
-
- The progress of the working group has seemed rather slow during the more
- than three years of its existence. This is primarily because its
- membership had substantial overlap with the POSIX.1 {8} working group;
- for example, the Chair of POSIX.2 was also the Technical Editor of
- POSIX.1 {8} (and POSIX.2 as well!) at the time. And, meetings were
- arbitrarily shortened to allow the POSIX.1 {8} group to move forward as
- quickly as possible.
-
-
- 1.1.1.6 Internationalization. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2)
-
- Some of the utilities and concepts described in this standard contain
- requirements that standardize multilingual and multicultural support.
- Most of the internationalized support for this standard was proposed by
- the UniForum Technical Committee Subcommittee on Internationalization, at
- the request of the POSIX.2 working group.
-
- UniForum, a nonprofit organization, organizes subcommittees of Technical
- Committees to do standards research on different topics pertinent to
- POSIX. The UniForum Subcommittee on Internationalization is one such
- group. It was formed to propose and promote standard internationalized
- extensions to POSIX-based systems. The POSIX.2 working group and the
- UniForum Subcommittee on Internationalization coordinated their work by
- the use of liaison members, who attended the meetings of both groups.
- The interaction between the two groups started when POSIX.2 asked the
- Subcommittee on Internationalization to provide internationalized support
- for regular expressions. Later, the Subcommittee on Internationalization
- was charged with identifying areas in the standard needing changes for
- internationalized support and proposing those changes.
-
- 1.1.1.7 Test Methods. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2)
-
- The POSIX.3 working group has worked on a test methods specification for
- verifying conformance to POSIX standards in general and POSIX.1 {8} and
- POSIX.2 in particular. Test methods for POSIX.2 should be published as a
- separate document1) sometime after POSIX.2 is approved.
-
-
-
- __________
- 1) See the Foreword for information on the activities of other POSIX
- working groups.
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 12 1 General
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- 1.1.1.8 Organization of the Standard. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f
- _P_1_0_0_3._2)
-
- The standard document is organized into sections. Some of these, such as
- the Scope in 1.1, are mandated by ISO/IEC, the IEEE, and other standards
- bodies. The remainder of the document is organized into small sections
- for the convenience of the working group and others. It has been
- suggested that all of the utility descriptions (and maybe the functions,
- too) should be lumped into one large section, all in alphabetical order.
- This would presumably make it easier for some users to use the document
- as a reference document. The working group deliberately chose to not
- organize it in this way, for the following reasons:
-
- (1) Certain sections are optional. It is more convenient for the
- document's internal references, and also for people specifying
- systems, if these optional sections are in large pieces, rather
- than a detailed list of utility names.
-
- (2) Future supplements to this standard will be adding new utilities
- that will also be optional. It would be confusing to try to
- merge documents at a level below major sections (chapters).
-
- END_RATIONALE
-
-
-
- 1.2 Normative References
-
- The following standards contain provisions which, through references in
- this text, constitute provisions of this standard. At the time of
- publication, the editions indicated were valid. All standards are
- subject to revision, and parties to agreements based on this part of this
- International Standard are encouraged to investigate the possibility of
- applying the most recent editions of the standards listed below. Members
- of IEC and ISO maintain registers of currently valid International
- Standards.
-
- {1} ISO/IEC 646: 1983,2) _I_n_f_o_r_m_a_t_i_o_n _p_r_o_c_e_s_s_i_n_g--_I_S_O _7-_b_i_t _c_o_d_e_d
- _c_h_a_r_a_c_t_e_r _s_e_t _f_o_r _i_n_f_o_r_m_a_t_i_o_n _i_n_t_e_r_c_h_a_n_g_e.
-
-
-
- __________
- 2) Under revision. (This notation is meant to explicitly reference the
- 1990 Draft International Standard version of ISO/IEC 646.)
-
- ISO/IEC documents can be obtained from the ISO office, 1, rue de
- Varembe', Case Postale 56, CH-1211, Gene`ve 20, Switzerland/Suisse.
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 1.2 Normative References 13
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- {2} ISO 1539: 1980, _P_r_o_g_r_a_m_m_i_n_g _l_a_n_g_u_a_g_e_s--_F_O_R_T_R_A_N.
-
- {3} ISO 4217: 1987, _C_o_d_e_s _f_o_r _t_h_e _r_e_p_r_e_s_e_n_t_a_t_i_o_n _o_f _c_u_r_r_e_n_c_i_e_s _a_n_d
- _f_u_n_d_s.
-
- {4} ISO 4873: 1986, _I_n_f_o_r_m_a_t_i_o_n _p_r_o_c_e_s_s_i_n_g--_I_S_O _8-_b_i_t _c_o_d_e _f_o_r
- _i_n_f_o_r_m_a_t_i_o_n _i_n_t_e_r_c_h_a_n_g_e--_S_t_r_u_c_t_u_r_e _a_n_d _r_u_l_e _f_o_r _i_m_p_l_e_m_e_n_t_a_t_i_o_n.
-
- {5} ISO 8859-1: 1987, _I_n_f_o_r_m_a_t_i_o_n _p_r_o_c_e_s_s_i_n_g--_8-_b_i_t _s_i_n_g_l_e-_b_y_t_e _c_o_d_e_d
- _g_r_a_p_h_i_c _c_h_a_r_a_c_t_e_r _s_e_t_s--_P_a_r_t _1: _L_a_t_i_n _a_l_p_h_a_b_e_t _N_o. _1.
-
- {6} ISO 8859-2: 1987, _I_n_f_o_r_m_a_t_i_o_n _p_r_o_c_e_s_s_i_n_g--_8-_b_i_t _s_i_n_g_l_e-_b_y_t_e _c_o_d_e_d
- _g_r_a_p_h_i_c _c_h_a_r_a_c_t_e_r _s_e_t_s--_P_a_r_t _2: _L_a_t_i_n _a_l_p_h_a_b_e_t _N_o. _2.
-
- {7} ISO/IEC 9899: 1990, _I_n_f_o_r_m_a_t_i_o_n _p_r_o_c_e_s_s_i_n_g _s_y_s_t_e_m_s--_P_r_o_g_r_a_m_m_i_n_g 1
- _l_a_n_g_u_a_g_e_s--_C.
-
- {8} ISO/IEC 9945-1: 1990, _I_n_f_o_r_m_a_t_i_o_n _t_e_c_h_n_o_l_o_g_y--_P_o_r_t_a_b_l_e _O_p_e_r_a_t_i_n_g
- _S_y_s_t_e_m _I_n_t_e_r_f_a_c_e (_P_O_S_I_X)--_P_a_r_t _1: _S_y_s_t_e_m _A_p_p_l_i_c_a_t_i_o_n _P_r_o_g_r_a_m
- _I_n_t_e_r_f_a_c_e (_A_P_I) [_C _L_a_n_g_u_a_g_e]
-
-
-
- 1.3 Conformance
-
-
- 1.3.1 Implementation Conformance
-
- 1.3.1.1 Requirements
-
- A _c_o_n_f_o_r_m_i_n_g _i_m_p_l_e_m_e_n_t_a_t_i_o_n shall meet all of the following criteria:
-
- (1) The system shall support all required interfaces defined within
- this standard. These interfaces shall support the functional
- behavior described herein. The system shall provide the shell
- command language described in Section 3 and the utilities in
- Section 4.
-
- (2) The system may provide one or more of the following: the
- Software Development Utilities Option, the C Language Bindings
- Option, the C Language Development Utilities Option, the FORTRAN
- Development Utilities Option, or the FORTRAN Runtime Utilities
- Option. When an implementation claims that an optional facility
- is provided, all of its constituent parts shall be provided.
-
- (3) The system may provide additional or enhanced utilities,
- functions, or facilities not required by this standard.
- Nonstandard extensions should be identified as such in the
- system documentation. Nonstandard extensions, when used, may
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 14 1 General
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- change the behavior of utilities, functions, or facilities
- defined by this standard. In such cases, the implementation's
- conformance document (see 2.2.1.2) shall define an execution
- environment (i.e., shall provide general operating instructions)
- in which an application can be run with the behavior specified
- by the standard. In no case shall such an environment require
- modification of a Strictly Conforming POSIX.2 Application.
-
-
- 1.3.1.2 Documentation
-
- A conformance document with the following information shall be available
- for an implementation claiming conformance to this standard. The
- conformance document shall have the same structure as this standard, with
- the information presented in the appropriately numbered sections;
- sections that consist solely of subordinate section titles, with no other
- information, are not required.
-
- The conformance document shall not contain information about extended
- facilities or capabilities outside the scope of this standard, unless
- those extensions affect the behavior of a Strictly Conforming POSIX.2
- Application; in such cases, the documentation required by the previous
- subclause shall be included.
-
- The conformance document shall contain a statement that indicates the
- full name, number, and date of the standard that applies. The
- conformance document may also list software standards approved by ISO/IEC
- or any ISO/IEC member body that are available for use by a Conforming
- POSIX.2 Application. It should indicate whether it is based on a fully-
- conformant POSIX.1 {8} system. Applicable characteristics where
- documentation is required by one of these standards, or by standards of
- government bodies, may also be included.
-
- The conformance document shall describe the symbolic values found in
- 2.13.2, stating values, the conditions under which those values can
- change, and the limits of such variations, if any.
-
- The conformance document shall describe the behavior of the
- implementation for all implementation-defined features defined in this
- standard. This requirement shall be met by listing these features and
- providing either a specific reference to the system documentation or
- providing full syntax and semantics of these features. When the value or
- behavior in the implementation is designed to be variable or customizable
- on each instantiation of the system, the implementation provider shall
- document the nature and permissible ranges of this variation. When
- information required by this standard is related to the underlying
- operating system and is already available in the POSIX.1 {8} conformance
- document, the implementation need not duplicate this information in the
- POSIX.2 conformance document, but may provide a cross-reference for this
- purpose.
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 1.3 Conformance 15
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- The conformance document may specify the behavior of the implementation
- for those features where this standard states that implementations may
- vary or where features are identified as undefined or unspecified.
-
- No specifications other than those described in this subclause (1.3.1.2)
- shall be present in the conformance document.
-
- The phrase ``shall be documented'' in this standard means that
- documentation of the feature shall appear in the conformance document, as
- described previously, unless the system documentation is explicitly
- mentioned.
-
- The system documentation should also contain the information found in the
- conformance document.
-
-
- 1.3.1.3 Conforming Implementation Options
-
- The following symbolic constants, described in 2.13.2 reflect
- implementation options for this standard that could warrant requirement
- by Conforming POSIX.2 Applications, or in specifications of conforming
- systems, or both:
-
- {POSIX2_SW_DEV} The system supports the Software Development
- Utilities Option in Section 6.
-
- {POSIX2_C_BIND} The system supports the C Language Bindings
- Option in Annex B.
-
- {POSIX2_C_DEV} The system supports the C Language Development
- Utilities Option in Annex A.
-
- {POSIX2_FORT_DEV} The system supports the FORTRAN Development
- Utilities Option in Annex C.
-
- {POSIX2_FORT_RUN} The system supports the FORTRAN Runtime
- Utilities Option in Annex C.
-
- {POSIX2_LOCALEDEF} The system supports the creation of locales as
- described in 4.35.
-
- Additional language bindings and development utility options may be
- provided in other related standards or in future revisions to this
- standard. In the former case, additional symbolic constants of the same
- general form as shown in this subclause should be defined by the related
- standard document and made available to the application, without
- requiring this POSIX.2 document to be updated.
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 16 1 General
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- 1.3.2 Application Conformance
-
- All applications claiming conformance to this standard fall within one of
- the following categories:
-
-
- 1.3.2.1 Strictly Conforming POSIX.2 Application
-
- A Strictly Conforming POSIX.2 Application is an application that requires
- only the facilities described in this standard (including any required
- facilities of the underlying operating system; see 2.9.1). Such an
- application:
-
- (1) shall accept any implementation behavior that results from
- actions it takes in areas described in this standard as
- _i_m_p_l_e_m_e_n_t_a_t_i_o_n-_d_e_f_i_n_e_d or _u_n_s_p_e_c_i_f_i_e_d, or where the standard
- indicates that implementations may vary;
-
- (2) shall not perform any actions that are described as producing
- _u_n_d_e_f_i_n_e_d results;
-
- (3) for symbolic constants, shall accept any value in the range
- permitted by this standard, but shall not rely on any value in
- the range being greater than the minimums listed in this
- standard;
-
- (4) shall not use facilities designated as _o_b_s_o_l_e_s_c_e_n_t;
-
- (5) is required to tolerate, and is permitted to adapt to, the 1
- presence or absence of optional facilities whose availability is 1
- indicated by the constants in 2.13.1, or that are described 1
- using the verb _m_a_y. However, an application requiring a high- 1
- level language binding option can only be considered at best a
- Conforming POSIX.2 Application; see 1.3.2.2.
-
- Within this standard, any restrictions placed upon a Conforming POSIX.2
- Application shall also restrict a Strictly Conforming POSIX.2
- Application.
-
- 1.3.2.2 Conforming POSIX.2 Application
-
- The term Conforming POSIX.2 Application is used to describe either of the
- two following application types.
-
-
-
-
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 1.3 Conformance 17
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- 1.3.2.2.1 ISO/IEC Conforming POSIX.2 Application
-
- An ISO/IEC Conforming POSIX.2 Application is an application that uses
- only the facilities described in this standard (including the implied
- facilities of the underlying operating system; see 2.9.1) and approved
- conforming language bindings for any ISO/IEC standard. Such an
- application shall include a statement of conformance that documents all
- options and limit dependencies, and all other ISO/IEC standards used.
-
- 1.3.2.2.2 <National Body> Conforming POSIX.2 Application
-
- A <National Body> Conforming POSIX.2 Application differs from an ISO/IEC
- Conforming POSIX.2 Application in that it also may use specific standards
- of a single ISO/IEC member body referred to here as ``<_N_a_t_i_o_n_a_l _B_o_d_y>.''
- Such an application shall include a statement of conformance that
- documents all options and limit dependencies, and all other <_N_a_t_i_o_n_a_l
- _B_o_d_y> standards used.
-
-
- 1.3.2.3 Conforming POSIX.2 Application Using Extensions
-
- A Conforming POSIX.2 Application Using Extensions is an application that
- differs from a Conforming POSIX.2 Application only in that it uses
- nonstandard facilities that are consistent with this standard. Such an
- application shall fully document its requirements for these extended
- facilities, in addition to the documentation required of a Conforming
- POSIX.2 Application. A Conforming POSIX.2 Application Using Extensions
- shall be either an ISO/IEC Conforming POSIX.2 Application Using
- Extensions or a <National Body> Conforming POSIX.2 Application Using
- Extensions (see 1.3.2.2.1 and 1.3.2.2.2).
-
- BEGIN_RATIONALE
-
-
- 1.3.3 Conformance Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2)
-
- These conformance definitions are closely related to those in
- POSIX.1 {8}.
-
- The terms _C_o_n_f_o_r_m_i_n_g _P_O_S_I_X._2 _A_p_p_l_i_c_a_t_i_o_n and its variants were selected
- to parallel the terms used in POSIX.1 {8}.
-
- The descriptions of the ISO/IEC and <National Body> Conforming POSIX.2
- Applications are similar to the same descriptions in POSIX.1 {8}. This
- is not a duplication of effort, as this standard relies on only a portion
- of POSIX.1 {8}, as explained in 1.1 and 2.9.1. Therefore conformance to
- POSIX.2 has to be described separately from any conformance options or
- requirements in POSIX.1 {8}.
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 18 1 General
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- A reference to a Language-Independent System Services Option was removed
- from the list of optional features that may be provided by the conforming
- implementation. There is no conformance value provided by that section,
- except as a reference point for functions actually provided by a real
- language binding. Therefore, the language binding sections are the ones
- that remain in the optional list. The Draft 8 section Language-Dependent
- Services for the C Programming Language was removed, as this subject is
- adequately, and appropriately, covered in Annex A.
-
- The documentation requirement for implementation extensions (``shall
- define an execution environment'') is simply meant to require that
- system-wide or per-user configuration options or environment variables
- that affect the operation of applications that use the standard utilities
- and functions be described in the conformance document. For example, if
- setting the (imaginary) LC_TRUTH variable causes changes in the exit
- status of true, the conformance document must describe this condition and
- how to avoid it--say, by unsetting the variable in the login script.
-
- For further rationale on the types of conformance, see the POSIX.1 {8}
- Rationale.
-
- END_RATIONALE
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 1.3 Conformance 19
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- P1003.2/D11.2
-
-
-
-
-
-
-
-
- Section 2: Terminology and General Requirements
-
-
-
- 2.1 Conventions
-
-
- 2.1.1 Editorial Conventions
-
- This standard uses the following editorial and typographical conventions.
- A summary of typographical conventions is shown in Table 2-1.
-
- The Bold Courier font is used to show brackets that denote optional
- arguments in a utility synopsis, as in
-
-
- cut [-_c _l_i_s_t] [_f_i_l_e__n_a_m_e]
-
- These brackets shall not be used by the application unless they are
- specifically mentioned as literal input characters by the utility
- description.
-
- There are two types of symbols enclosed in angle brackets (< >):
-
- C-Language Headers The header name is in the Courier font, such as
- <sys/stat.h>. When coding C programs, the
- brackets are used as required by the language.
-
- Parameters Parameters, also called _m_e_t_a_v_a_r_i_a_b_l_e_s, are in
- italics, such as <_d_i_r_e_c_t_o_r_y _p_a_t_h_n_a_m_e>. The
- entire symbol, including the brackets, is meant
- to be replaced by the value of the symbol
- described within the brackets.
-
- Numbers within braces, such as ``POSIX.1 {8},'' represent cross
- references to the Normative References clause (see 1.2). If the number
- is preceded by a B, it represents a Bibliographic entry (see Annex D).
- Bibliographic entries are for information only.
-
- In some examples, the Bold Courier font is used to indicate the system's
- output that resulted from some user input, shown in Courier.
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 2.1 Conventions 21
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
-
- Table 2-1 - Typographical Conventions
- __________________________________________________________________________________________________________________________________________________
- Reference Example
- ___________________________________________________________________
-
- C-Language Data Type _l_o_n_g
- C-Language Function _s_y_s_t_e_m()
- C-Language Function Argument _a_r_g_1
- C-Language Global External _e_r_r_n_o
- C-Language Header <sys/stat.h>
- C-Language Keyword #define
- Cross Reference: Annex Annex A
- Cross Reference: Clause 2.3
- Cross Reference: Other Standard ISO 9999-1 {_n}
- Cross Reference: Section Section 2
- Cross Reference: Subclause 2.3.4, 2.3.4.5, 2.3.4.5.6
- Defined Term (see text)
- Environment Variable PATH
- Error Number [EINTR]
- Example Input echo foo
- Example Output foo
- Figure Reference Figure 7
- File Name /tmp
- Parameter <_d_i_r_e_c_t_o_r_y _p_a_t_h_n_a_m_e>
- Special Character <newline>
- Symbolic Constant, Limit {_POSIX_VDISABLE}, {LINE_MAX}
- Table Reference Table 6
- Utility Name awk
- Utility Operand _f_i_l_e__n_a_m_e
- Utility Option -c
- Utility Option with Option-Argument -w _w_i_d_t_h
- __________________________________________________________________________________________________________________________________________________
-
-
- Defined terms are shown in three styles, depending on context:
-
- (1) Terms defined in 2.2.1, 2.2.2, and 3.1 are expressed as
- subclause titles. Alternative forms of the terms appear in
- [brackets].
-
- (2) The initial appearances of other terms, applying to a limited
- portion of the text, are in _i_t_a_l_i_c_s.
-
- (3) Subsequent appearances of the term are in the Roman font.
-
- Symbolic constants are shown in two styles: those within curly braces
- are intended to call the reader's attention to values in <limits.h> and
- <unistd.h>; those without braces are usually defined by one or a few
- related functions. There is no semantic difference between these two
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 22 2 Terminology and General Requirements
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- forms of presentation.
-
- Filenames and pathnames are shown in Courier. When a pathname is shown
- starting with ``$HOME/'', this indicates the remaining components of the
- pathname are to be related to the directory named by the user's HOME
- environment variable.
-
- The style selected for some of the special characters, such as <newline>,
- matches the form of the input given to the localedef utility (see 2.5.2).
- Generally, the characters selected for this special treatment are those
- that are not visually distinct, such as the control characters <tab> or
- <newline>.
-
- Literal characters and strings used as input or output are shown in
- various ways, depending on context:
-
- %, begin When no confusion would result, the character or string is
- rendered in the Courier font and used directly in the
- text.
-
- 'c' In some cases a character is enclosed in single-quote
- characters, similar to a C-language character constant.
- Unless otherwise noted, the quotes shall not be used as
- input or output.
-
- "string" In some cases, a string is enclosed in double-quote
- characters, similar to a C-language string constant.
- Unless otherwise noted, the quotes shall not be used as
- input or output.
-
- Defined names that are usually in lowercase, particularly function names,
- are never used at the beginning of a sentence or anywhere else that
- regular English usage would require them to be capitalized.
-
- Parenthetical expressions within normative text also contain normative
- information. The general typographic hierarchy of parenthetical
- expressions is:
-
- { [ ( ) ] }
-
- The square brackets are most frequently used to enclose a parenthetical
- expression that contains a function name [such as _w_a_i_t_p_i_d()], with its
- built-in parentheses.
-
- In some cases, tabular information is presented inline; in others it is
- presented in a separately-labeled Table. This arrangement was employed
- purely for ease of reference and there is no normative difference between
- these two cases.
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 2.1 Conventions 23
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- Annexes marked as _n_o_r_m_a_t_i_v_e are parts of the standard that pose
- requirements, exactly the same as the numbered Sections, but have been
- moved to near the end of the document for clarity of exposition.
- _I_n_f_o_r_m_a_t_i_v_e Annexes are for information only and pose no requirements.
- All material preceding page 1 of the document (the ``front matter'') and
- the two indexes at the end are also only informative.
-
- NOTES that appear in a smaller point size and are indented have one of
- two different meanings, depending on their location:
-
- - When they are within the normal text of the document, they are the
- same as footnotes--informative, posing no requirements on
- implementations or applications.
-
- - When they are attached to Tables or Figures, they are normative,
- posing requirements.
-
- Text marked as examples (including the use of ``e.g.'') is for
- information only. The exception to this comes in the C-language programs
- and program fragments used to represent algorithms, as described in
- 2.1.3.
-
- The typographical conventions listed here are for ease of reading only.
- Editorial inconsistencies in the use of typography are unintentional and
- have no normative meaning in this standard.
-
-
- 2.1.2 Grammar Conventions
-
- Portions of this standard are expressed in terms of a special grammar
- notation. It is used to portray the complex syntax of certain program
- input. The grammar is based on the syntax used by the yacc utility (see
- A.3). However, it does not represent fully functional yacc input,
- suitable for program use: the lexical processing and all semantic
- requirements are described only in textual form. The grammar is not
- based on source used in any traditional implementation and has not been
- tested with the semantic code that would normally be required to
- accompany it. Furthermore, there is no implication that the partial yacc
- code presented represents the most efficient, or only, means of
- supporting the complex syntax within the utility. Implementations may
- use other programming languages or algorithms, as long as the syntax
- supported is the same as that represented by the grammar.
-
- The following typographical conventions are used in the grammar; they
- have no significance except to aid in reading.
-
- - The identifiers for the reserved words of the language are shown
- with a leading capital letter. (These are terminals in the
- grammar. Examples: While, Case.)
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 24 2 Terminology and General Requirements
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- - The identifiers for terminals in the grammar are all named with 1
- uppercase letters and underscores. Examples: NEWLINE, ASSIGN_OP, 1
- NAME. 1
-
- - The identifiers for nonterminals are all lowercase.
-
-
- 2.1.3 Miscellaneous Conventions
-
- This standard frequently uses the C language to express algorithms in
- terms of programs or program fragments. The following shall be
- considered in reading this code:
-
- - The programs use the syntax and semantics described by the
- C Standard {7}.
-
- - The programs are merely examples and do not represent the most
- efficient, or only, means of coding the interface. Implementations
- may use other programming languages or algorithms, as long as the
- results are the same as those achieved by the programs in this
- standard.
-
- - C-language comments are informative and pose no requirements.
-
- Further conventions are presented in:
-
- - Utility Conventions, 2.10, describing utility and application
- command-line syntax
-
- - File Format Notation, 2.12, describing the notation used to
- represent utility input and output
-
-
- 2.1.4 Conventions Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2)
-
- The C language was chosen for many examples because:
-
- - It eliminates any requirement to document a different pseudocode.
-
- - It is a familiar language to many of the potential readers of
- POSIX.2.
-
- - It is the language most widely used for historical implementations
- of the utilities.
-
-
-
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 2.1 Conventions 25
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- 2.2 Definitions
-
-
- 2.2.1 Terminology
-
- For the purposes of this standard, the following definitions apply:
-
-
- 2.2.1.1 can: The word _c_a_n is to be interpreted as describing a
- permissible optional feature or behavior available to the application;
- the implementation shall support such features or behaviors as mandatory
- requirements.
-
- 2.2.1.2 conformance document: A document provided by an implementor
- that contains implementation details as described in 1.3.1.2.
-
-
- 2.2.1.3 implementation: An object providing to applications and users
- the services defined by this standard. The word _i_m_p_l_e_m_e_n_t_a_t_i_o_n is to be
- interpreted to mean that object, after it has been modified in accordance
- with the manufacturer's instructions to:
-
- - configure it for conformance with this standard;
-
- - select some of the various optional facilities described by this
- standard, through customization by local system administrators or
- operators.
-
- An exception to this meaning occurs when discussing conformance
- documentation or using the term _i_m_p_l_e_m_e_n_t_a_t_i_o_n _d_e_f_i_n_e_d. See 2.2.1.4 and
- 1.3.1.2.
-
- 2.2.1.4 implementation defined: When a value or behavior is described
- by this standard as _i_m_p_l_e_m_e_n_t_a_t_i_o_n _d_e_f_i_n_e_d, the implementation provider
- shall document the requirements for correct program construction and
- correct data in the use of that value or behavior. When the value or
- behavior in the implementation is designed to be variable or customizable
- on each instantiation of the system, the implementation provider shall
- document the nature and permissible ranges of this variation. (See
- 1.3.1.2.)
-
-
- 2.2.1.5 may: The word _m_a_y is to be interpreted as describing an
- optional feature or behavior of the implementation that is not required
- by this standard, but there is no prohibition against providing it. A 1
- Strictly Conforming POSIX.2 Application is permitted to use such 1
- features, but shall not rely on the implementation's actions in such 1
- cases. To avoid ambiguity, the reverse sense of _m_a_y is not expressed as 1
- _m_a_y _n_o_t, but as _n_e_e_d _n_o_t.
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 26 2 Terminology and General Requirements
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- 2.2.1.6 obsolescent: Certain features are _o_b_s_o_l_e_s_c_e_n_t, which means that
- they may be considered for withdrawal in future revisions of this
- standard. They are retained in this version because of their widespread
- use. Their use in new applications is discouraged.
-
-
- 2.2.1.7 shall: In this standard, the word _s_h_a_l_l is to be interpreted as
- a requirement on the implementation or on Strictly Conforming POSIX.2
- Applications, where appropriate.
-
- 2.2.1.8 should: With respect to implementations, the word _s_h_o_u_l_d is to
- be interpreted as an implementation recommendation, but not a
- requirement. With respect to applications, the word _s_h_o_u_l_d is to be
- interpreted as recommended programming practice for applications and a
- requirement for Strictly Conforming POSIX.2 Applications.
-
-
- 2.2.1.9 system documentation: All documentation provided with an
- implementation, except the conformance document. Electronically
- distributed documents for an implementation are considered part of the
- system documentation.
-
- 2.2.1.10 undefined: A value or behavior is _u_n_d_e_f_i_n_e_d if the standard
- imposes no portability requirements on applications for erroneous program
- construction, erroneous data, or use of an indeterminate value.
- Implementations (or other standards) may specify the result of using that
- value or causing that behavior. An application using such behaviors is
- using extensions, as defined in 1.3.2.3.
-
-
- 2.2.1.11 unspecified: A value or behavior is _u_n_s_p_e_c_i_f_i_e_d if the
- standard imposes no portability requirements on applications for a
- correct program construction or correct data. Implementations (or other
- standards) may specify the result of using that value or causing that
- behavior. An application requiring a specific behavior, rather than
- tolerating any behavior when using that functionality, is using
- extensions, as defined in 1.3.2.3.
-
- BEGIN_RATIONALE
-
- 2.2.1.12 Terminology Rationale (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2)
-
- Most of these terms were adapted from their POSIX.1 {8} counterparts with
- little modification.
-
- The reader is referred to the definition of _p_r_o_g_r_a_m in 2.2.2.119 to
- understand the expression ``program construction.'' The use of _p_r_o_g_r_a_m
- in this standard is differentiated from POSIX.1 {8}'s emphasis only on
- high level languages by this standard's broader concern with utility and
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 2.2 Definitions 27
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- command language interactions. Included in the scope of program
- construction are:
-
- (1) Shell command language
-
- (2) Command arguments
-
- (3) Regular expressions, of various types
-
- (4) Command input language syntax, such as awk, bc, ed, lex, make,
- sed, and yacc. Some of these are so complex that they rival
- traditional high level languages.
-
- The usage of _c_a_n and _m_a_y were selected to contrast optional application
- behavior (can) against optional implementation behavior (may).
-
- The term _s_u_p_p_o_r_t_e_d was removed from Draft 8; it had originally been
- copied from the POSIX.1 {8} document, but it later became clear that its
- requirement for function ``stubs'' for unsupported functions made little
- sense in this standard. The term _s_u_p_p_o_r_t therefore reverts to its
- English-language meaning.
-
- The term _o_b_s_o_l_e_s_c_e_n_t was changed to _d_e_p_r_e_c_a_t_e_d in some earlier drafts,
- but it was restored to match POSIX.1 {8}'s use of the term. It means
- ``do not use this feature in new applications.'' The obsolescence
- concept is not an ideal solution, but was used as a method of increasing
- consensus: many more objections would be heard from the user community
- if some of these historical features were suddenly withdrawn without the
- grace period obsolescence implies. The phrase ``may be considered for
- withdrawal in future revisions'' implies that the result of that
- consideration might in fact keep those features indefinitely if the
- predominance of applications does not migrate away from them quickly.
-
- END_RATIONALE
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 28 2 Terminology and General Requirements
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- 2.2.2 General Terms
-
- For the purposes of this standard, the following definitions apply.
-
-
- 2.2.2.1 absolute pathname: See _p_a_t_h_n_a_m_e _r_e_s_o_l_u_t_i_o_n in 2.2.2.104.
-
- 2.2.2.2 address space: The memory locations that can be referenced by a
- process. [POSIX.1 {8}]
-
-
- 2.2.2.3 affirmative response: An input string that matches one of the
- responses acceptable to the LC_MESSAGES category keyword yesexpr,
- matching an extended regular expression in the current locale; see 2.5.
-
- 2.2.2.4 <alert>: A character that in the output stream shall indicate 1
- that a terminal should alert its user via a visual or audible 1
- notification.
-
- The <alert> shall be the character designated by '\a' in the C language
- binding. It is unspecified whether this character is the exact sequence
- transmitted to an output device by the system to accomplish the alert
- function.
-
-
- 2.2.2.5 angle brackets: The characters ``<'' (_l_e_f_t-_a_n_g_l_e-_b_r_a_c_k_e_t) and
- ``>'' (_r_i_g_h_t-_a_n_g_l_e-_b_r_a_c_k_e_t).
-
- When used in the phrase ``enclosed in angle brackets'' the symbol ``<''
- shall immediately precede the object to be enclosed, and ``>'' shall
- immediately follow it. When describing these characters in 2.4, the
- names <less-than-sign> and <greater-than-sign> are used.
-
- 2.2.2.6 appropriate privileges: An implementation-defined means of
- associating privileges with a process with regard to the function calls
- and function call options defined in POSIX.1 {8} that need special
- privileges.
-
- There may be zero or more such means. [POSIX.1 {8}]
-
-
- 2.2.2.7 argument: A parameter passed to a utility as the equivalent of
- a single string in the _a_r_g_v array created by one of the POSIX.1 {8} _e_x_e_c
- functions.
-
- See 2.10.1 and 3.9.1.1. An argument is one of the options, option-
- arguments, or operands following the command name.
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 2.2 Definitions 29
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- 2.2.2.8 asterisk: The character ``*''.
-
-
- 2.2.2.9 background process: A process that is a member of a background
- process group. [POSIX.1 {8}]
-
- 2.2.2.10 background process group: Any process group, other than a
- foreground process group, that is a member of a session that has
- established a connection with a controlling terminal. [POSIX.1 {8}]
-
-
- 2.2.2.11 backquote: The character ```'', also known as a _g_r_a_v_e _a_c_c_e_n_t.
-
- 2.2.2.12 backslash: The character ``\'', also known as a _r_e_v_e_r_s_e
- _s_o_l_i_d_u_s.
-
-
- 2.2.2.13 <backspace>: A character that normally causes printing (or
- displaying) to occur one column position previous to the position about
- to be printed.
-
- The <backspace> shall be the character designated by '\b' in the C
- language binding. It is unspecified whether this character is the exact
- sequence transmitted to an output device by the system to accomplish the
- backspace function. The <backspace> character defined here is not
- necessarily the ERASE special character defined in POSIX.1 {8} 7.1.1.9.
-
- 2.2.2.14 basename: The final, or only, filename in a pathname.
-
-
- 2.2.2.15 basic regular expression: A pattern (sequence of characters or
- symbols) constructed according to the rules defined in 2.8.3.
-
- 2.2.2.16 <blank>: One of the characters that belong to the blank
- character class as defined via the LC_CTYPE category in the current
- locale.
-
- In the POSIX Locale, a <blank> is either a <tab> or a <space>.
-
-
- 2.2.2.17 blank line: A line consisting solely of zero or more <blank>s
- terminated by a <newline>.
-
- See also _e_m_p_t_y _l_i_n_e (2.2.2.44).
-
- 2.2.2.18 block special file: A file that refers to a device.
-
- A block special file is normally distinguished from a character special
- file by providing access to the device in a manner such that the hardware
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 30 2 Terminology and General Requirements
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- characteristics of the device are not visible. [POSIX.1 {8}]
-
-
- 2.2.2.19 braces: The characters ``{'' (_l_e_f_t _b_r_a_c_e) and ``}'' (_r_i_g_h_t
- _b_r_a_c_e), also known as _c_u_r_l_y _b_r_a_c_e_s.
-
- When used in the phrase ``enclosed in (curly) braces'' the symbol ``{''
- shall immediately precede the object to be enclosed, and ``}'' shall
- immediately follow it. When describing these characters in 2.4, the
- names <left-brace> and <right-brace> are used.
-
- 2.2.2.20 brackets: The characters ``['' (_l_e_f_t-_b_r_a_c_k_e_t) and ``]''
- (_r_i_g_h_t-_b_r_a_c_k_e_t), also known as _s_q_u_a_r_e _b_r_a_c_k_e_t_s.
-
- When used in the phrase ``enclosed in (square) brackets'' the symbol
- ``['' shall immediately precede the object to be enclosed, and ``]''
- shall immediately follow it. When describing these characters in 2.4,
- the names <left-square-bracket> and <right-square-bracket> are used.
-
-
- 2.2.2.21 built-in utility: A utility implemented within a shell.
-
- The utilities referred to as _s_p_e_c_i_a_l _b_u_i_l_t-_i_n_s have special qualities,
- described in 3.14. Unless qualified, the term _b_u_i_l_t-_i_n includes the
- special built-in utilities.
-
- The utilities referred to as _r_e_g_u_l_a_r _b_u_i_l_t-_i_n_s are those named in
- Table 2-2. As indicated in 2.3, there is no requirement that these
- utilities be actually built into the shell on the implementation, but
- that they do have special command-search qualities.
-
- 2.2.2.22 byte: An individually addressable unit of data storage that is 1
- equal to or larger than an octet, used to store a character or a portion 1
- of a character; see 2.2.2.24. 1
-
- A byte is composed of a contiguous sequence of bits, the number of which 1
- is implementation defined. The least significant bit is called the _l_o_w-
- _o_r_d_e_r bit; the most significant is called the _h_i_g_h-_o_r_d_e_r bit.
- [POSIX.1 {8}]
-
- NOTE: This definition of _b_y_t_e is actually from the C Standard {7}
- because POSIX.1 {8} merely references it without copying the text. It 1
- has been reworded slightly to clarify its intent without introducing the 1
- C Standard {7} terminology ``basic execution character set,'' which is 1
- inapplicable to this standard. It deviates intentionally from the usage 1
- of _b_y_t_e in some other standards, where it is used as a synonym for _o_c_t_e_t 1
- (always eight bits). On a POSIX.1 {8} system, a byte may be larger than 1
- eight bits so that it can be an integral portion of larger data objects 1
- that are not evenly divisible by eight bits (such as a 36-bit word that 1
- contains 4 9-bit bytes). 1
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 2.2 Definitions 31
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- 2.2.2.23 <carriage-return>: A character that in the output stream shall 1
- indicate that printing should start at the beginning of the same physical
- line in which the <carriage-return> occurred.
-
- The <carriage-return> shall be the character designated by '\r' in the C
- language binding. It is unspecified whether this character is the exact
- sequence transmitted to an output device by the system to accomplish the
- movement to the beginning of the line.
-
-
- 2.2.2.24 character: A sequence of one or more bytes representing a
- single graphic symbol.
- NOTE: This term corresponds in the C Standard {7} to the term _m_u_l_t_i_b_y_t_e
- _c_h_a_r_a_c_t_e_r, noting that a single-byte character is a special case of
- multibyte character. Unlike the usage in the C Standard {7}, _c_h_a_r_a_c_t_e_r
- here has no necessary relationship with storage space, and _b_y_t_e is used
- when storage space is discussed.
-
- [POSIX.1 {8}]
-
- (See 2.4 for a further explanation of the graphical representations of
- characters, or ``glyphs,'' versus character encodings.)
-
- 2.2.2.25 character class: A named set of characters sharing an
- attribute associated with the name of the class.
-
- The classes and the characters that they contain are dependent on the
- value of the LC_CTYPE category in the current locale; see 2.5.
-
-
- 2.2.2.26 character special file: A file that refers to a device.
-
- One specific type of character special file is a terminal device file,
- whose access is defined in POSIX.1 {8} section 7.1. Other character
- special files have no structure defined by this standard, and their use
- is unspecified by this standard. [POSIX.1 {8}]
-
- 2.2.2.27 circumflex: The character ``^''.
-
-
- 2.2.2.28 collating element: The smallest entity used to determine the
- logical ordering of strings.
-
- See _c_o_l_l_a_t_i_o_n _s_e_q_u_e_n_c_e (2.2.2.30). A collating element shall consist of
- either a single character, or two or more characters collating as a
- single entity. The value of the LC_COLLATE category in the current
- locale determines the current set of collating elements.
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 32 2 Terminology and General Requirements
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- 2.2.2.29 collation: The logical ordering of strings according to
- defined precedence rules.
-
- These rules identify a collation sequence between the collating elements,
- and such additional rules that can be used to order strings consisting of
- multiple collating elements.
-
-
- 2.2.2.30 collation sequence: The relative order of collating elements
- as determined by the setting of the LC_COLLATE category in the current
- locale.
-
- The character order, as defined for the LC_COLLATE category in the 2
- current locale (see 2.5.2.2), defines the relative order of all collating 2
- elements, such that each element occupies a unique position in the order. 2
- In addition, one or more collation weights can be assigned for each 2
- collating element; these weights are used to determine the relative order 2
- of strings in, e.g., the sort utility. 2
-
- Multilevel sorting is accomplished by assigning elements one or more
- collation weights, up to the limit {COLL_WEIGHTS_MAX}. On each level,
- elements may be given the same weight (at the primary level, called an 1
- _e_q_u_i_v_a_l_e_n_c_e _c_l_a_s_s; see 2.2.2.47) or be omitted from the sequence.
- Strings that collate equal using the first assigned weight (primary
- ordering), are then compared using the next assigned weight (secondary
- ordering), and so on.
-
- 2.2.2.31 column position: A unit of horizontal measure related to
- characters in a line. 2
-
- It is assumed that each character in a character set has an intrinsic 2
- column width independent of any output device. Each printable character 2
- in the portable character set has a column width of one. The standard 2
- utilities, when used as described in this standard, assume that all 2
- characters have integral column widths. The column width of a character 2
- is not necessarily related to the internal representation of the 2
- character (numbers of bits or octets). 2
-
- The column position of a character in a line is defined as one plus the 2
- sum of the column widths of the preceding characters in the line. Column 2
- positions are numbered starting from 1.
-
-
- 2.2.2.32 command: A directive to the shell to perform a particular
- task; see 3.9.
-
- 2.2.2.33 current working directory: See _w_o_r_k_i_n_g _d_i_r_e_c_t_o_r_y in 2.2.2.159.
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 2.2 Definitions 33
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- 2.2.2.34 command language interpreter: See 2.2.2.133.
-
-
- 2.2.2.35 directory: A file that contains directory entries.
-
- No two directory entries in the same directory shall have the same name.
- [POSIX.1 {8}]
-
- 2.2.2.36 directory entry [link]: An object that associates a filename
- with a file.
-
- Several directory entries can associate names with the same file.
- [POSIX.1 {8}]
-
-
- 2.2.2.37 dollar-sign: The character ``$''.
-
- This standard permits the substitution of the ``currency symbol'' graphic
- defined in ISO/IEC 646 {1} for this symbol when the character set being
- used has substituted that graphic for the graphic $. The graphic symbol
- $ is always used in this standard, but not in any monetary sense.
-
- 2.2.2.38 dot: The filename consisting of a single dot character (.).
-
- See _p_a_t_h_n_a_m_e _r_e_s_o_l_u_t_i_o_n in 2.2.2.104. [POSIX.1 {8}]
-
- In the context of shell special built-in utilities, see 3.14.4.
-
-
- 2.2.2.39 dot-dot: The filename consisting solely of two dot characters
- (..).
-
- See _p_a_t_h_n_a_m_e _r_e_s_o_l_u_t_i_o_n in 2.2.2.104. [POSIX.1 {8}]
-
- 2.2.2.40 double-quote: The character ``"'', also known as _q_u_o_t_a_t_i_o_n-
- _m_a_r_k.
-
-
- 2.2.2.41 effective group ID: An attribute of a process that is used in
- determining various permissions, including file access permissions,
- described in 2.2.2.55.
-
- See _g_r_o_u_p _I_D. This value is subject to change during the process
- lifetime, as described in POSIX.1 {8} 3.1.2 (_e_x_e_c) and 4.2.2 [_s_e_t_g_i_d()].
- [POSIX.1 {8}]
-
- 2.2.2.42 effective user ID: An attribute of a process that is used in
- determining various permissions, including file access permissions.
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 34 2 Terminology and General Requirements
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- See _u_s_e_r _I_D. This value is subject to change during the process
- lifetime, as described in POSIX.1 {8} 3.1.2 (_e_x_e_c) and 4.2.2 [_s_e_t_u_i_d()].
- [POSIX.1 {8}]
-
-
- 2.2.2.43 empty directory: A directory that contains, at most, directory
- entries for dot and dot-dot. [POSIX.1 {8}]
-
- 2.2.2.44 empty line: A line consisting of only a <newline> character.
-
- See also _b_l_a_n_k _l_i_n_e (2.2.2.17).
-
-
- 2.2.2.45 empty string [null string]: A character array whose first
- element is a null character. [POSIX.1 {8}]
-
- 2.2.2.46 Epoch: The time 0 hours, 0 minutes, 0 seconds, January 1,
- 1970, Coordinated Universal Time.
-
- See _s_e_c_o_n_d_s _s_i_n_c_e _t_h_e _E_p_o_c_h. [POSIX.1 {8}]
-
-
- 2.2.2.47 equivalence class: A set of collating elements with the same 1
- primary collation weight. 1
-
- Elements in an equivalence class are typically elements that naturally
- group together, such as all accented letters based on the same base
- letter.
-
- The collation order of elements within an equivalence class is determined 1
- by the weights assigned on any subsequent levels after the primary 1
- weight. 1
-
- 2.2.2.48 executable file: A regular file acceptable as a new process
- image file by the equivalent of the POSIX.1 {8} _e_x_e_c family of functions,
- and thus usable as one form of a utility.
-
- See _e_x_e_c in POSIX.1 {8} 3.1.2. The standard utilities described as
- compilers can produce executable files, but other unspecified methods of
- producing executable files may also be provided. The internal format of
- an executable file is unspecified, but a conforming application shall not
- assume an executable file is a text file.
-
-
- 2.2.2.49 execute: To perform the actions described in 3.9.1.1.
-
- See also _i_n_v_o_k_e (2.2.2.79).
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 2.2 Definitions 35
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- 2.2.2.50 extended regular expression: A pattern (sequence of characters
- or symbols) constructed according to the rules defined in 2.8.4.
-
-
- 2.2.2.51 extended security controls: A concept of the underlying
- system, as follows. [POSIX.1 {8}]
-
- The access control (see _f_i_l_e _a_c_c_e_s_s _p_e_r_m_i_s_s_i_o_n_s) and privilege (see
- _a_p_p_r_o_p_r_i_a_t_e _p_r_i_v_i_l_e_g_e_s in 2.2.2.6) mechanisms have been defined to allow
- implementation-defined extended security controls. These permit an
- implementation to provide security mechanisms to implement different
- security policies than described in POSIX.1 {8}. These mechanisms shall
- not alter or override the defined semantics of any of the functions in
- POSIX.1 {8}.
-
- 2.2.2.52 feature test macro: A #defined symbol used to determine
- whether a particular set of features will be included from a header.
-
- See POSIX.1 {8} 2.7.1. [POSIX.1 {8}]
-
-
- 2.2.2.53 FIFO special file [FIFO]: A type of file with the property
- that data written to such a file is read on a first-in-first-out basis.
-
- Other characteristics of _F_I_F_Os are described in POSIX.1 {8} 5.3.1
- [_o_p_e_n()], 6.4.1 [_r_e_a_d()], 6.4.2 [_w_r_i_t_e()], and 6.5.3 [_l_s_e_e_k()].
- [POSIX.1 {8}]
-
- 2.2.2.54 file: An object that can be written to, or read from, or both.
-
- A file has certain attributes, including access permissions and type.
- File types include regular file, character special file, block special
- file, FIFO special file, and directory. Other types of files may be
- defined by the implementation. [POSIX.1 {8}]
-
-
- 2.2.2.55 file access permissions: A concept of the underlying system,
- as follows. [POSIX.1 {8}]
-
- The standard file access control mechanism uses the file permission bits,
- as described below. These bits are set at file creation by _o_p_e_n(),
- _c_r_e_a_t(), _m_k_d_i_r(), and _m_k_f_i_f_o() and are changed by _c_h_m_o_d(). These bits
- are read by _s_t_a_t() or _f_s_t_a_t().
-
- Implementations may provide _a_d_d_i_t_i_o_n_a_l or _a_l_t_e_r_n_a_t_e file access control
- mechanisms, or both. An additional access control mechanism shall only
- further restrict the access permissions defined by the file permission
- bits. An alternate access control mechanism shall:
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 36 2 Terminology and General Requirements
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- (1) Specify file permission bits for the file owner class, file
- group class, and file other class of the file, corresponding to
- the access permissions, to be returned by _s_t_a_t() or _f_s_t_a_t().
-
- (2) Be enabled only by explicit user action, on a per-file basis by
- the file owner or a user with the appropriate privilege.
-
- (3) Be disabled for a file after the file permission bits are
- changed for that file with _c_h_m_o_d(). The disabling of the
- alternate mechanism need not disable any additional mechanisms
- defined by an implementation.
-
- Whenever a process requests file access permission for read, write, or
- execute/search, if no additional mechanism denies access, access is
- determined as follows:
-
- (1) If a process has the appropriate privilege:
-
- (a) If read, write, or directory search permission is
- requested, access is granted.
-
- (b) If execute permission is requested, access is granted if
- execute permission is granted to at least one user by the
- file permission bits or by an alternate access control
- mechanism; otherwise, access is denied.
-
- (2) Otherwise:
-
- (a) The file permission bits of a file contain read, write,
- and execute/search permissions for the file owner class,
- file group class, and file other class.
-
- (b) Access is granted if an alternate access control mechanism
- is not enabled and the requested access permission bit is
- set for the class (file owner class, file group class, or
- file other class) to which the process belongs, or if an
- alternate access control mechanism is enabled and it
- allows the requested access; otherwise, access is denied.
-
-
- 2.2.2.56 file descriptor: A per-process unique, nonnegative integer
- used to identify an open file for the purpose of file access.
- [POSIX.1 {8}]
-
- 2.2.2.57 file group class: The property of a file indicating access
- permissions for a process related to the process's group identification.
-
- A process is in the file group class of a file if the process is not in
- the file owner class and if the effective group ID or one of the
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 2.2 Definitions 37
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- supplementary group IDs of the process matches the group ID associated
- with the file. Other members of the class may be implementation defined.
- [POSIX.1 {8}]
-
-
- 2.2.2.58 file hierarchy: A concept of the underlying system, as
- follows. [POSIX.1 {8}]
-
- Files in the system are organized in a hierarchical structure in which
- all of the nonterminal nodes are directories and all of the terminal
- nodes are any other type of file. Because multiple directory entries may
- refer to the same file, the hierarchy is properly described as a
- ``directed graph.''
-
- 2.2.2.59 file mode: An object containing the file permission bits and
- other characteristics of a file, as described in POSIX.1 {8} 5.6.1.
- [POSIX.1 {8}]
-
-
- 2.2.2.60 file mode bits: A file's file permission bits, set-user-ID-
- on-execution bit (S_ISUID), and set-group-ID-on-execution bit (S_ISGID)
- (see POSIX.1 {8} 5.6.1.2).
-
- 2.2.2.61 filename: A name consisting of 1 to {NAME_MAX} bytes used to
- name a file.
-
- The characters composing the name may be selected from the set of all
- character values excluding the slash character and the null character.
- The filenames dot and dot-dot have special meaning; see _p_a_t_h_n_a_m_e
- _r_e_s_o_l_u_t_i_o_n in 2.2.2.104. A filename is sometimes referred to as a
- pathname component. [POSIX.1 {8}]
-
-
- 2.2.2.62 filename portability: A concept of the underlying system, as
- follows. [POSIX.1 {8}]
-
- Filenames should be constructed from the portable filename character set
- because the use of other characters can be confusing or ambiguous in
- certain contexts.
-
- 2.2.2.63 file offset: The byte position in the file where the next I/O
- operation begins.
-
- Each open file description associated with a regular file, block special
- file, or directory has a file offset. A character special file that does
- not refer to a terminal device may have a file offset. There is no file
- offset specified for a pipe or FIFO. [POSIX.1 {8}]
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 38 2 Terminology and General Requirements
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- 2.2.2.64 file other class: The property of a file indicating access
- permissions for a process related to the process's user and group
- identification.
-
- A process is in the file other class of a file if the process is not in
- the file owner class or file group class. [POSIX.1 {8}]
-
-
- 2.2.2.65 file owner class: The property of a file indicating access
- permissions for a process related to the process's user identification.
-
- A process is in the file owner class of a file if the effective user ID
- of the process matches the user ID of the file. [POSIX.1 {8}]
-
- 2.2.2.66 file permission bits: Information about a file that is used,
- along with other information, to determine if a process has read, write,
- or execute/search permission to a file.
-
- The bits are divided into three parts: owner, group, and other. Each
- part is used with the corresponding file class of processes. These bits
- are contained in the file mode, as described in POSIX.1 {8} 5.6.1. The
- detailed usage of the file permission bits in access decisions is
- described in _f_i_l_e _a_c_c_e_s_s _p_e_r_m_i_s_s_i_o_n_s in 2.2.2.55. [POSIX.1 {8}]
-
-
- 2.2.2.67 file serial number: A per-file-system unique identifier for a
- file.
-
- File serial numbers are unique throughout a file system. [POSIX.1 {8}]
-
- 2.2.2.68 file system: A collection of files and certain of their
- attributes.
-
- It provides a name space for file serial numbers referring to those
- files. [POSIX.1 {8}]
-
-
- 2.2.2.69 file times update: A concept of the underlying system, as
- follows. [POSIX.1 {8}]
-
- Each file has three distinct associated time values: _s_t__a_t_i_m_e, _s_t__m_t_i_m_e,
- and _s_t__c_t_i_m_e. The _s_t__a_t_i_m_e field is associated with the times that the
- file data is accessed; _s_t__m_t_i_m_e is associated with the times that the
- file data is modified; and _s_t__c_t_i_m_e is associated with the times that
- file status is changed. These values are returned in the file
- characteristics structure, as described in POSIX.1 {8} 5.6.1.
-
- Any function in this standard that is required to read or write file data
- or change the file status indicates which of the appropriate time-related
- fields are to be ``marked for update.'' If an implementation of such a
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 2.2 Definitions 39
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- function marks for update a time-related field not specified by this
- standard, this shall be documented, except that any changes caused by
- pathname resolution need not be documented. For the other functions in
- this standard (those that are not explicitly required to read or write
- file data or change file status, but that in some implementations happen
- to do so), the effect is unspecified.
-
- An implementation may update fields that are marked for update
- immediately, or it may update such fields periodically. When the fields
- are updated, they are set to the current time and the update marks are
- cleared. All fields that are marked for update shall be updated when the
- file is no longer open by any process, or when a _s_t_a_t() or _f_s_t_a_t() is
- performed on the file. Other times at which updates are done are
- unspecified. Updates are not done for files on read-only file systems.
-
-
- 2.2.2.70 file type: See _f_i_l_e in 2.2.2.54.
-
- 2.2.2.71 filter: A command whose operation consists of reading data
- from standard input or a list of input files and writing data to standard
- output.
-
- Typically, its function is to perform some transformation on the data
- stream.
-
-
- 2.2.2.72 foreground process: A process that is a member of a foreground
- process group. [POSIX.1 {8}]
-
- 2.2.2.73 foreground process group: A process group whose member
- processes have certain privileges, denied to processes in background
- process groups, when accessing their controlling terminal.
-
- Each session that has established a connection with a controlling
- terminal has exactly one process group of the session as the foreground
- process group of that controlling terminal. See POSIX.1 {8} 7.1.1.4.
- [POSIX.1 {8}]
-
-
- 2.2.2.74 <form-feed>: A character that in the output stream shall 1
- indicate that printing should start on the next page of an output device.
-
- The <form-feed> shall be the character designated by '\f' in the C
- language binding. If <form-feed> is not the first character of an output
- line, the result is unspecified. It is unspecified whether this
- character is the exact sequence transmitted to an output device by the
- system to accomplish the movement to the next page.
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 40 2 Terminology and General Requirements
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- 2.2.2.75 group ID: A nonnegative integer, which can be contained in an
- object of type _g_i_d__t, that is used to identify a group of system users.
-
- Each system user is a member of at least one group. When the identity of
- a group is associated with a process, a group ID value is referred to as
- a real group ID, an effective group ID, one of the (optional)
- supplementary group IDs, or an (optional) saved set-group-ID.
- [POSIX.1 {8}]
-
-
- 2.2.2.76 hard link: The relationship between two directory entries that
- represent the same file; the result of an execution of the ln utility or
- the POSIX.1 {8} _l_i_n_k() function.
-
- 2.2.2.77 home directory: The current directory associated with a user
- at the time of login.
-
-
- 2.2.2.78 incomplete line: A sequence of text consisting of one or more
- non-<newline> characters at the end of the file.
-
- 2.2.2.79 invoke: To perform the actions described in 3.9.1.1, except
- that searching for shell functions and special built-ins is suppressed.
-
- See also _e_x_e_c_u_t_e (2.2.2.49).
-
-
- 2.2.2.80 job control: A facility that allows users to selectively stop
- (suspend) the execution of processes and continue (resume) their
- execution at a later point.
-
- The user typically employs this facility via the interactive interface
- jointly supplied by the terminal I/O driver and a command interpreter.
- POSIX.1 {8} conforming implementations may optionally support job control
- facilities; the presence of this option is indicated to the application
- at compile time or run time by the definition of the {_POSIX_JOB_CONTROL}
- symbol; see POSIX.1 {8} 2.9. [POSIX.1 {8}]
-
- 2.2.2.81 line: A sequence of text consisting of zero or more non-
- <newline> characters plus a terminating <newline> character.
-
-
- 2.2.2.82 link: See _d_i_r_e_c_t_o_r_y _e_n_t_r_y in 2.2.2.36.
-
- 2.2.2.83 link count: The number of directory entries that refer to a
- particular file. [POSIX.1 {8}]
-
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 2.2 Definitions 41
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- 2.2.2.84 locale: The definition of the subset of a user's environment
- that depends on language and cultural conventions; see 2.5.
-
-
- 2.2.2.85 login: The unspecified activity by which a user gains access
- to the system.
-
- Each login shall be associated with exactly one login name.
- [POSIX.1 {8}]
-
- 2.2.2.86 login name: A user name that is associated with a login.
- [POSIX.1 {8}]
-
-
- 2.2.2.87 mode: A collection of attributes that specifies a file's type
- and its access permissions.
-
- See _f_i_l_e _a_c_c_e_s_s _p_e_r_m_i_s_s_i_o_n_s in 2.2.2.55. [POSIX.1 {8}]
-
- 2.2.2.88 multicharacter collating element: A sequence of two or more
- characters that collate as an entity.
-
- For example, in some coded character sets, an accented character is
- represented by a (nonspacing) accent, followed by the letter. Another
- example is the Spanish elements ``ch'' and ``ll.''
-
-
- 2.2.2.89 negative response: An input string that matches one of the
- responses acceptable to the LC_MESSAGES category keyword noexpr, matching
- an extended regular expression in the current locale.
-
- See 2.5.
-
- 2.2.2.90 <newline>: A character that in the output stream shall 1
- indicate that printing should start at the beginning of the next line.
-
- The <newline> shall be the character designated by '\n' in the C language
- binding. It is unspecified whether this character is the exact sequence
- transmitted to an output device by the system to accomplish the movement
- to the next line.
-
-
- 2.2.2.91 NUL: A character with all bits set to zero.
-
- 2.2.2.92 null string: See _e_m_p_t_y _s_t_r_i_n_g in 2.2.2.45.
-
-
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 42 2 Terminology and General Requirements
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- 2.2.2.93 number-sign: The character ``#''.
-
- This standard permits the substitution of the ``pound sign'' graphic
- defined in ISO/IEC 646 {1} for this symbol when the character set being
- used has substituted that graphic for the graphic #. The graphic symbol
- # is always used in this standard.
-
-
- 2.2.2.94 object file: A regular file containing the output of a
- compiler, formatted as input to a linkage editor for linking with other
- object files into an executable form.
-
- The methods of linking are unspecified and may involve the dynamic
- linking of objects at run-time. The internal format of an object file is
- unspecified, but a conforming application shall not assume an object file
- is a text file.
-
- 2.2.2.95 open file: A file that is currently associated with a file
- descriptor. [POSIX.1 {8}]
-
-
- 2.2.2.96 operand: An argument to a command that is generally used as an
- object supplying information to a utility necessary to complete its
- processing.
-
- Operands generally follow the options in a command line. See 2.10.1.
-
- 2.2.2.97 option: An argument to a command that is generally used to
- specify changes in the _u_t_i_l_i_t_y's default behavior; see 2.10.1.
-
-
- 2.2.2.98 option-argument: A parameter that follows certain options.
-
- In some cases an option-argument is included within the same argument
- string as the option; in most cases it is the next argument. See 2.10.1.
-
- 2.2.2.99 parent directory:
-
- (1) When discussing a given directory, the directory that both
- contains a directory entry for the given directory and is
- represented by the pathname dot-dot in the given directory.
-
- (2) When discussing other types of files, a directory containing a
- directory entry for the file under discussion.
-
- This concept does not apply to dot and dot-dot. [POSIX.1 {8}]
-
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 2.2 Definitions 43
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- 2.2.2.100 parent process: See _p_r_o_c_e_s_s in 2.2.2.114. [POSIX.1 {8}]
-
-
- 2.2.2.101 parent process ID: An attribute of a new process after it is
- created by a currently active process.
-
- The parent process ID of a process is the process ID of its creator, for
- the lifetime of the creator. After the creator's lifetime has ended, the
- parent process ID is the process ID of an implementation-defined system
- process. [POSIX.1 {8}]
-
- 2.2.2.102 pathname: A string that is used to identify a file.
-
- A pathname consists of, at most, {PATH_MAX} bytes, including the
- terminating null character. It has an optional beginning slash, followed
- by zero or more filenames separated by slashes. If the pathname refers
- to a directory, it may also have one or more trailing slashes. Multiple
- successive slashes are considered to be the same as one slash. A
- pathname that begins with two successive slashes may be interpreted in an
- implementation-defined manner, although more than two leading slashes
- shall be treated as a single slash. The interpretation of the pathname
- is described in _p_a_t_h_n_a_m_e _r_e_s_o_l_u_t_i_o_n in 2.2.2.104. [POSIX.1 {8}]
-
-
- 2.2.2.103 pathname component: See _f_i_l_e_n_a_m_e in 2.2.2.61. [POSIX.1 {8}]
-
- 2.2.2.104 pathname resolution: A concept of the underlying system, as
- follows. [POSIX.1 {8}]
-
- Pathname resolution is performed for a process to resolve a pathname to a
- particular file in a file hierarchy. There may be multiple pathnames
- that resolve to the same file.
-
- Each filename in the pathname is located in the directory specified by
- its predecessor (for example, in the pathname fragment ``a/b'', file
- ``b'' is located in directory ``a''). Pathname resolution fails if this
- cannot be accomplished. If the pathname begins with a slash, the
- predecessor of the first filename in the pathname is taken to be the root
- directory of the process (such pathnames are referred to as absolute
- pathnames). If the pathname does not begin with a slash, the predecessor
- of the first filename of the pathname is taken to be the current working
- directory of the process (such pathnames are referred to as ``relative
- pathnames'').
-
- The interpretation of a pathname component is dependent on the values of
- {NAME_MAX} and {_POSIX_NO_TRUNC} associated with the path prefix of that
- component. If any pathname component is longer than {NAME_MAX}, and
- {_POSIX_NO_TRUNC} is in effect for the path prefix of that component [see
- _p_a_t_h_c_o_n_f() in POSIX.1 {8} 5.7.1], the implementation shall consider this
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 44 2 Terminology and General Requirements
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- an error condition. Otherwise, the implementation shall use the first
- {NAME_MAX} bytes of the pathname component.
-
- The special filename dot refers to the directory specified by its
- predecessor. The special filename dot-dot refers to the parent directory
- of its predecessor directory. As a special case, in the root directory,
- dot-dot may refer to the root directory itself.
-
- A pathname consisting of a single slash resolves to the root directory of
- the process. A null pathname is invalid.
-
-
- 2.2.2.105 path prefix: A pathname, with an optional ending slash, that
- refers to a directory. [POSIX.1 {8}]
-
- 2.2.2.106 pattern: A sequence of characters used either with regular
- expression notation (see 2.8) or for pathname expansion (see 3.6.6), as a
- means of selecting various character strings or pathnames, respectively.
-
- The syntaxes of the two patterns are similar, but not identical; this
- standard always indicates the type of pattern being referred to in the
- immediate context of the use of the term.
-
-
- 2.2.2.107 period: The character ``.''.
-
- The term _p_e_r_i_o_d is contrasted against _d_o_t (2.2.2.38), which is used to
- describe a specific directory entry.
-
- 2.2.2.108 permissions: See _f_i_l_e _a_c_c_e_s_s _p_e_r_m_i_s_s_i_o_n_s in 2.2.2.55.
-
-
- 2.2.2.109 pipe: An object accessed by one of the pair of file
- descriptors created by the POSIX.1 {8} _p_i_p_e() function.
-
- Once created, the file descriptors can be used to manipulate it, and it
- behaves identically to a FIFO special file when accessed in this way. It
- has no name in the file hierarchy. [POSIX.1 {8}]
-
- 2.2.2.110 portable character set: The set of characters described in
- 2.4 that is supported on all conforming systems.
-
- This term is contrasted against the smaller _p_o_r_t_a_b_l_e _f_i_l_e_n_a_m_e _c_h_a_r_a_c_t_e_r
- _s_e_t; see 2.2.2.111.
-
-
- 2.2.2.111 portable filename character set: The set of characters from
- which portable filenames are constructed.
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 2.2 Definitions 45
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- For a filename to be portable across conforming implementations of this
- standard, it shall consist only of the following characters:
-
- A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
- a b c d e f g h i j k l m n o p q r s t u v w x y z
- 0 1 2 3 4 5 6 7 8 9 . _ -
-
- The last three characters are the period, underscore, and hyphen
- characters, respectively. The hyphen shall not be used as the first
- character of a portable filename. Upper- and lowercase letters shall
- retain their unique identities between conforming implementations. In
- the case of a portable pathname, the slash character may also be used.
- [POSIX.1 {8}]
-
-
- 2.2.2.112 printable character: One of the characters included in the
- print character classification of the LC_CTYPE category in the current
- locale; see 2.5.2.1.
-
- 2.2.2.113 privilege: See _a_p_p_r_o_p_r_i_a_t_e _p_r_i_v_i_l_e_g_e_s in 2.2.2.6.
- [POSIX.1 {8}]
-
-
- 2.2.2.114 process: An address space and single thread of control that
- executes within that address space, and its required system resources.
-
- A process is created by another process issuing the POSIX.1 {8} _f_o_r_k()
- function. The process that issues _f_o_r_k() is known as the parent process,
- and the new process created by the _f_o_r_k() is known as the child process.
- [POSIX.1 {8}]
-
- The attributes of processes required by POSIX.2 form a subset of those in
- POSIX.1 {8}; see 2.9.1.
-
- 2.2.2.115 process group: A collection of processes that permits the
- signaling of related processes.
-
- Each process in the system is a member of a process group that is
- identified by a process group ID. A newly created process joins the
- process group of its creator. [POSIX.1 {8}]
-
-
- 2.2.2.116 process group ID: The unique identifier representing a
- process group during its lifetime.
-
- A process group ID is a positive integer that can be contained in a
- _p_i_d__t. It shall not be reused by the system until the process group
- lifetime ends. [POSIX.1 {8}]
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 46 2 Terminology and General Requirements
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- 2.2.2.117 process group leader: A process whose process ID is the same
- as its process group ID. [POSIX.1 {8}]
-
-
- 2.2.2.118 process ID: The unique identifier representing a process.
-
- A process ID is a positive integer that can be contained in a _p_i_d__t. A
- process ID shall not be reused by the system until the process lifetime
- ends. In addition, if there exists a process group whose process group
- ID is equal to that process ID, the process ID shall not be reused by the
- system until the process group lifetime ends. A process that is not a
- system process shall not have a process ID of 1. [POSIX.1 {8}]
-
- 2.2.2.119 program: A prepared sequence of instructions to the system to
- accomplish a defined task.
-
- The term _p_r_o_g_r_a_m in POSIX.2 encompasses applications written in the Shell
- Command Language, complex utility input languages (for example, awk, lex,
- sed, etc.), and high-level languages.
-
-
- 2.2.2.120 read-only file system: A file system that has
- implementation-defined characteristics restricting modifications.
- [POSIX.1 {8}]
-
- 2.2.2.121 real group ID: The attribute of a process that, at the time
- of process creation, identifies the group of the user who created the
- process.
-
- See _g_r_o_u_p _I_D in 2.2.2.75. This value is subject to change during the
- process lifetime, as described in POSIX.1 {8} 4.2.2 [_s_e_t_g_i_d()].
- [POSIX.1 {8}]
-
-
- 2.2.2.122 real user ID: The attribute of a process that, at the time of
- process creation, identifies the user who created the process.
-
- See _u_s_e_r _I_D in 2.2.2.154. This value is subject to change during the
- process lifetime, as described in POSIX.1 {8} 4.2.2 [_s_e_t_u_i_d()].
- [POSIX.1 {8}]
-
- 2.2.2.123 regular expression: A pattern (sequence of characters or 1
- symbols) constructed according to the rules defined in 2.8. 1
-
-
- 2.2.2.124 regular file: A file that is a randomly accessible sequence
- of bytes, with no further structure imposed by the system. [POSIX.1 {8}]
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 2.2 Definitions 47
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- 2.2.2.125 relative pathname: See _p_a_t_h_n_a_m_e _r_e_s_o_l_u_t_i_o_n in 2.2.2.104.
- [POSIX.1 {8}]
-
-
- 2.2.2.126 root directory: A directory, associated with a process, that
- is used in pathname resolution for pathnames that begin with a slash.
- [POSIX.1 {8}]
-
- 2.2.2.127 saved set-group-ID: An attribute of a process that allows
- some flexibility in the assignment of the effective group ID attribute,
- when the saved set-user-ID option is implemented, as described in
- POSIX.1 {8} 3.1.2 (_e_x_e_c) and 4.2.2 [_s_e_t_g_i_d()]. [POSIX.1 {8}]
-
-
- 2.2.2.128 saved set-user-ID: An attribute of a process that allows some
- flexibility in the assignment of the effective user ID attribute, when
- the saved set-user-ID option is implemented, as described in POSIX.1 {8}
- 3.1.2 and 4.2.2 [_s_e_t_u_i_d()]. [POSIX.1 {8}]
-
- 2.2.2.129 seconds since the Epoch: A value to be interpreted as the
- number of seconds between a specified time and the Epoch.
-
- A Coordinated Universal Time name [specified in terms of seconds
- (_t_m__s_e_c), minutes (_t_m__m_i_n), hours (_t_m__h_o_u_r), days since January 1 of the
- year (_t_m__y_d_a_y), and calendar year minus 1900 (_t_m__y_e_a_r)] is related to a
- time represented as seconds since the Epoch, according to the expression
- below.
-
- If the year < 1970 or the value is negative, the relationship is
- undefined. If the year _> 1970 and the value is nonnegative, the value is
- related to a Coordinated Universal Time name according to the expression:
-
- _t_m__s_e_c + _t_m__m_i_n*60 + _t_m__h_o_u_r*3600 + _t_m__y_d_a_y*86400 +
- (_t_m__y_e_a_r-70)*31536000 + ((_t_m__y_e_a_r-69)/4)*86400
-
- [POSIX.1 {8}]
-
-
- 2.2.2.130 session: A collection of process groups established for job
- control purposes.
-
- Each process group is a member of a session. A process is considered to
- be a member of the session of which its process group is a member. A
- newly created process joins the session of its creator. A process can
- alter its session membership (see POSIX.1 {8} 4.3.2 [_s_e_t_s_i_d()].
- Implementations that support the POSIX.1 {8} _s_e_t_p_g_i_d() function (see
- POSIX.1 {8} 4.3.3) can have multiple process groups in the same session.
- [POSIX.1 {8}]
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 48 2 Terminology and General Requirements
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- 2.2.2.131 session leader: A process that has created a session; see
- POSIX.1 {8} 4.3.2 [_s_e_t_s_i_d()]. [POSIX.1 {8}]
-
-
- 2.2.2.132 session lifetime: The period between when a session is
- created and the end of the lifetime of all the process groups that remain
- as members of the session. [POSIX.1 {8}]
-
- 2.2.2.133 shell: A program that interprets sequences of text input as
- commands.
-
- It may operate on an input stream or it may interactively prompt and read
- commands from a terminal.
-
-
- 2.2.2.134 Shell, The: The Shell Command Language Interpreter (see
- 4.56), a specific instance of a shell.
-
- 2.2.2.135 shell script: A file containing shell commands.
-
- If the file is made executable, it can be executed by specifying its name
- as a simple command (see the description of _s_i_m_p_l_e _c_o_m_m_a_n_d in 3.9.1).
- Execution of a shell script causes a shell to execute the commands within
- the script. Alternately, a shell can be requested to execute the
- commands in a shell script by specifying the name of the shell script as
- the operand to the sh utility.
-
-
- 2.2.2.136 signal: A mechanism by which a process may be notified of, or
- affected by, an event occurring in the system.
-
- Examples of such events include hardware exceptions and specific actions
- by processes. The term _s_i_g_n_a_l is also used to refer to the event itself.
- [POSIX.1 {8}]
-
- 2.2.2.137 single-quote: The character ``''', also known as _a_p_o_s_t_r_o_p_h_e.
-
-
- 2.2.2.138 slash: The character ``/'', also known as _s_o_l_i_d_u_s.
-
- 2.2.2.139 source code: When dealing with the Shell Command Language,
- source code is input to the command language interpreter.
-
- The term _s_h_e_l_l _s_c_r_i_p_t is synonymous with this meaning.
-
- When dealing with the C Language Bindings Option, source code is input to
- a C compiler conforming to the C Standard {7}.
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 2.2 Definitions 49
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- When dealing with another ISO/IEC conforming language, source code is
- input to a compiler conforming to that ISO/IEC standard.
-
- Source code also refers to the input statements prepared for the
- following standard utilities: awk, bc, ed, lex, localedef, make, sed,
- and yacc.
-
- Source code can also refer to a collection of sources meeting any or all
- of these meanings.
-
-
- _2._2._2._1_4_0 <space>: The character defined in 2.4 as <space>.
-
- The <space> character is a member of the space character class of the
- current locale, but represents the single character, and not all of the
- possible members of the class. (See 2.2.2.158.)
-
- 2.2.2.141 standard error: An output stream usually intended to be used
- for diagnostic messages.
-
-
- 2.2.2.142 standard input: An input stream usually intended to be used
- for primary data input.
-
- 2.2.2.143 standard output: An output stream usually intended to be used
- for primary data output.
-
-
- 2.2.2.144 standard utilities: The utilities defined by this standard,
- in the Sections 4, 5, and 6, and Annex A, and Annex C, and in similar
- sections of utility definitions introduced in future revisions of, and
- supplements to, this standard.
-
- 2.2.2.145 stream: An ordered sequence of characters, as described by
- the C Standard {7}.
-
-
- 2.2.2.146 supplementary group ID: An attribute of a process used in
- determining file access permissions.
-
- A process has up to {NGROUPS_MAX} supplementary group IDs in addition to
- the effective group ID. The supplementary group IDs of a process are set
- to the supplementary group IDs of the parent process when the process is
- created. Whether a process's effective group ID is included in or
- omitted from its list of supplementary group IDs is unspecified.
- [POSIX.1 {8}]
-
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 50 2 Terminology and General Requirements
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- 2.2.2.147 system: An implementation of this standard.
-
-
- 2.2.2.148 <tab>: The horizontal tab character.
-
- 2.2.2.149 terminal [terminal device]: A character special file that
- obeys the specifications of the POSIX.1 {8} General Terminal Interface.
- [POSIX.1 {8}]
-
-
- 2.2.2.150 text column: A roughly rectangular block of characters
- capable of being laid out side-by-side next to other text columns on an
- output page or terminal screen.
-
- The widths of text columns are measured in column positions.
-
- 2.2.2.151 text file: A file that contains characters organized into one
- or more lines.
-
- The lines shall not contain NUL characters and none shall exceed
- {LINE_MAX} bytes in length, including the <newline>. Although
- POSIX.1 {8} does not distinguish between text files and binary files (see
- the C Standard {7}), many utilities only produce predictable or
- meaningful output when operating on text files. The standard utilities
- that have such restrictions always specify _t_e_x_t _f_i_l_e_s in their Standard
- Input or Input Files subclauses.
-
-
- 2.2.2.152 tilde: The character ``~''.
-
- 2.2.2.153 user database: See Section 9 in POSIX.1 {8}.
-
-
- 2.2.2.154 user ID: A nonnegative integer, which can be contained in an
- object of type _u_i_d__t, that is used to identify a system user.
-
- When the identity of a user is associated with a process, a user ID value
- is referred to as a real user ID, an effective user ID, or an (optional)
- saved set-user-ID. [POSIX.1 {8}]
-
- 2.2.2.155 user name: A string that is used to identify a user, as
- described in POSIX.1 {8} 9.1. [POSIX.1 {8}]
-
-
- 2.2.2.156 utility: A program that can be called by name from a shell to
- perform a specific task, or related set of tasks.
-
- This program shall either be an executable file, such as might be
- produced by a compiler/linker system from computer source code, or a file
- of shell source code, directly interpreted by the shell. The program may
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 2.2 Definitions 51
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- have been produced by the user, provided by the implementor of this
- standard, or acquired from an independent distributor. The term _u_t_i_l_i_t_y
- does not apply to the special built-in utilities provided as part of the
- shell command language; see 3.14. The system may implement certain
- utilities as shell functions (see 3.9.5) or built-ins (see 2.3), but only
- an application that is aware of the command search order described in
- 3.9.1.1 or of performance characteristics can discern differences between
- the behavior of such a function or built-in and that of a true executable
- file.
-
-
- _2._2._2._1_5_7 <vertical-tab>: The vertical tab character.
-
- 2.2.2.158 white space: A sequence of one or more characters that belong
- to the space character class as defined via the LC_CTYPE category in the
- current locale.
-
- In the POSIX Locale, white space consists of one or more <blank>s
- (<space>s and <tab>s), <newline>s, <carriage-return>s, <form-feed>s, and
- <vertical-tab>s.
-
-
- 2.2.2.159 working directory [current working directory]: A directory,
- associated with a process, that is used in pathname resolution for
- pathnames that do not begin with a slash.
-
- 2.2.2.160 write: To output characters to a file, such as standard
- output or standard error.
-
- Unless otherwise stated, standard output is the default output
- destination for all uses of the term _w_r_i_t_e.
-
- BEGIN_RATIONALE
-
-
- 2.2.2.161 General Terms Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f
- _P_1_0_0_3._2)
-
- Many of the terms originated in POSIX.1 {8} and are duplicated in this
- standard to meet editorial requirements. In some cases, there is
- supplementary text that presents additional information concerning
- POSIX.2 aspects of the concept.
-
- This standard uses the term _c_h_a_r_a_c_t_e_r to mean a sequence of one or more
- bytes representing a single graphic symbol, as defined in POSIX.1 {8}. 1
- The deviation in the exact text of the C Standard {7} definition for _b_y_t_e 1
- meets the intent of the C Standard {7} Rationale and the developers of 1
- POSIX.1 {8}, but clears up the ambiguity raised by the term _b_a_s_i_c 1
- _e_x_e_c_u_t_i_o_n _c_h_a_r_a_c_t_e_r _s_e_t, which is not defined in POSIX.1 {8}. It is 1
- expected that a future version of POSIX.1 {8} will align with the text 1
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 52 2 Terminology and General Requirements
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- used here. The octet-minimum requirement is merely a reflection of the 1
- {CHAR_BIT} value in POSIX.1 {8} and the C Standard {7}. 1
-
- The POSIX.1 {8} term _f_i_l_e _m_o_d_e is a superset of the POSIX.2 _f_i_l_e _m_o_d_e
- _b_i_t_s. POSIX.1 {8} defines the file mode as the entire _m_o_d_e__t object
- (which includes the file type in historically the upper four bits, the
- sticky bit on most implementations, and potentially other nonstandardized
- attributes), while POSIX.2 file mode bits include only the eleven defined
- bits.
-
- The terms _c_o_m_m_a_n_d and _u_t_i_l_i_t_y are related but have distinct meanings.
- Command is defined as ``a directive to a shell to perform a specific
- task.'' The directive can be in the form of a single utility name (for
- example, ls), or the directive can take the form of a compound command
- (for example, ls | grep name | pr).
-
- A utility is a program that is callable by name from a shell. Issuing
- only the utility's name to a shell is the equivalent of a one-word
- command. A utility may be invoked as a separate program that executes in
- a different process than the command language interpreter, or may be
- implemented as a part of the command language interpreter. For example,
- the echo command (the directive to perform a specific task) may be
- implemented such that the echo utility (the logic that performs the task
- of echoing) is in a separate program; and therefore, is executed in a
- process that is different than the command language interpreter.
- Conversely, the logic that performs the echo utility could be built into
- the command language interpreter; and therefore, execute in the same
- process as the command language interpreter.
-
- The terms _t_o_o_l and _a_p_p_l_i_c_a_t_i_o_n can be thought of as being synonymous with
- _u_t_i_l_i_t_y from the perspective of the operating system kernel. Tools,
- applications, and utilities have historically run, typically, in
- processes above the kernel level. Tools and utilities have been
- historically a part of the operating system nonkernel code, and performed
- system related functions such as listing directory contents, checking
- file systems, repairing file systems, or extracting system status
- information. Applications have not generally been a part of the
- operating system, and perform nonsystem related functions such as word
- processing, architectural design, mechanical design, workstation
- publishing, or financial analysis. Utilities have most frequently been
- provided by the operating system vendor, applications by third party
- software vendors or by the users themselves. Nevertheless, the standard
- does not differentiate between tools, utilities, and applications when it
- comes to receiving services from the system, a shell, or the standard
- utilities. (For example, the xargs utility invokes another utility; it
- would be of fairly limited usefulness if the users couldn't run their own
- applications in place of the standard utilities.) Utilities are not
- applications in the sense that they are not themselves subjects to the
- restrictions of this standard or any other standard--there is no
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 2.2 Definitions 53
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- requirement for grep, stty, or any of the utilities defined here to be
- any of the classes of Conforming POSIX.2 Applications.
-
- The term _t_e_x_t _f_i_l_e does not prevent the inclusion of control or other
- nonprintable characters (other than NUL). Therefore, standard utilities
- that list text files as inputs or outputs are either able to process the
- special characters gracefully or they explicitly describe their
- limitations within their individual subclauses. The definition of _t_e_x_t
- _f_i_l_e has caused a good deal of controversy. The only difference between
- text and binary here is that text files have lines of (less than
- {LINE_MAX}) bytes, with no NUL characters, each terminated by a <newline>
- character. The definition allows a file with a single <newline>, but not
- a totally empty file, to be called a text file. If a file ends with an
- incomplete line it is not strictly a text file by this definition. A
- related point is that the <newline> character referred to in this
- standard is not some generic line separator, but a single character;
- files created on systems where they use multiple characters for ends of
- lines are not portable to all POSIX systems without some translation
- process unspecified by this standard.
-
- The term _h_a_r_d _l_i_n_k is historically-derived. In systems without
- extensions to ln, it is a synonym for _l_i_n_k. The concept of a _s_y_m_b_o_l_i_c
- _l_i_n_k originated with BSD systems and the term _h_a_r_d is used to
- differentiate between the two types of links.
-
- There are some terms used that are undefined in POSIX.2, POSIX.1 {8}, or
- the C Standard {7}. The working group believes that these terms have a
- ``common usage,'' and that a definition in POSIX.2 would not be
- appropriate. Terms in this category include, but are not limited to, the
- following: _a_p_p_l_i_c_a_t_i_o_n, _c_h_a_r_a_c_t_e_r _s_e_t, _l_o_g_i_n _s_e_s_s_i_o_n, _u_s_e_r. Good
- sources for general terms of this type are the _I_S_O/_A_F_N_O_R _D_i_c_t_i_o_n_a_r_y _o_f
- _C_o_m_p_u_t_e_r _S_c_i_e_n_c_e {B12} and _I_E_E_E _D_i_c_t_i_o_n_a_r_y {B18}.
-
- The term _f_i_l_e _n_a_m_e was defined in previous drafts to be a synonym for
- _p_a_t_h_n_a_m_e. It was removed in the face of objections that it was too close
- to _f_i_l_e_n_a_m_e, which means something different (a pathname component). The
- general solution to this has been to use the term _f_i_l_e in parameter
- names, rather than _f_i_l_e__n_a_m_e, and to make more liberal use of the correct
- term, _p_a_t_h_n_a_m_e; an alternate solution has been to replace _f_i_l_e _n_a_m_e with
- _t_h_e _n_a_m_e _o_f _t_h_e _f_i_l_e.
-
- Many character names are included in this subclause. Because of
- historical usage, some of these names are a bit different than the ones
- used in international standards for character sets, such as ISO/IEC 646
- {1}. It was felt that many more UNIX system people than character set
- lawyers would be reading and reviewing the standard, so the former group
- was the one accommodated. On the other hand, the precise definitions of
- <space>, <blank>, and _w_h_i_t_e _s_p_a_c_e have replaced common usage (where they
- have been used virtually interchangeably), as the standard attempts to
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 54 2 Terminology and General Requirements
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- balance readability against precision.
-
- In earlier drafts, the names for the character pairs ( ), [ ], and { }
- were referred to as ``opening'' and ``closing'' parentheses, brackets,
- and braces. These were changed to the current ``left'' and right.''
- When the characters are used to express natural language, the terms
- ``open'' and ``close'' imply text direction more strongly than ``left''
- and ``right.'' By POSIX.2 definition, the character <open-parenthesis>
- will always be mapped to the glyph '(' regardless of the locale. But
- when reading right-to-left, the opening punctuation of a parenthesized
- text segment would be ')'. The <left-parenthesis> and <right-
- parenthesis> forms are the correct ones because the punctuation appears
- on the left and right, respectively, of the parenthesized text regardless
- of the direction one might be reading the text.
-
- The <backspace> character and the ERASE special character defined in
- POSIX.1 {8} should not be confused. The use of the <backspace> character
- and the ERASE special character defined in the POSIX.1 {8} _t_e_r_m_i_o_s clause
- on special characters (7.1.1.9) are distinct even though the ERASE
- special character may be set to <backspace>.
-
- In most one-byte character sets, such as ASCII, the concepts of column
- positions is identical to character positions and to bytes. Therefore,
- it has been historically acceptable for some implementations to describe
- line folding or tab stops or table column alignment in terms of bytes or
- character positions. Other character sets pose complications, as they
- can have internal representations longer than one octet and they can have
- displayable characters that have different widths on the terminal screen
- or printer.
-
- In this standard the term _c_o_l_u_m_n _p_o_s_i_t_i_o_n_s has been defined to mean
- character--not byte--positions in input files (such as ``column position
- 7 of the FORTRAN input''). Output files describe the column position in
- terms of the display width of the narrowest printable character in the
- character set, adjusted to fit the characteristics of the output device.
- It is very possible that _n column positions will not be able to hold _n
- characters in some character sets, unless all of those characters are of
- the narrowest width. It is assumed that the implementation is aware of
- the width of the various characters, deriving this information from the
- value of LC_CTYPE, and thus can determine how many column positions to
- allot for each character in those utilities where it is important. This
- information is not available to the portable application writer because
- POSIX.2 provides no interface specification to retrieve such information.
-
- The term _c_o_l_u_m_n _p_o_s_i_t_i_o_n was used instead of the more natural _c_o_l_u_m_n as
- the latter is frequently used in the standard in the different contexts
- of columns of figures, columns of table values, etc. Wherever confusion
- might result, these latter types of columns are referred to as _t_e_x_t
- _c_o_l_u_m_n_s.
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 2.2 Definitions 55
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- The definition of _b_i_n_a_r_y _f_i_l_e was removed, as the term is not used in the
- standard.
-
- The ISO/IEC 646 {1} character set standard permits substitution of
- national currency symbols for the character $ in the ``reference
- character set'' (which is the same as ASCII). This standard permits the
- substitution only of the actual characters shown in ISO/IEC 646 {1}:
- currency sign for the dollar sign and pound sign for the number sign.
- This document uses the latter names and their symbols, but it is valid
- for an implementation to accept, for instance, the pound sign () as a
- comment character in the shell, if that is what the locale's character
- set uses instead of the number sign (#). Other variation of national
- currency symbols are not allowed, per the request of the WG15 POSIX
- working group.
-
- The term _s_t_r_e_a_m is not related to System V's STREAMS communications
- facility; it is derived from historical UNIX system usage and has been
- made official by the C Standard {7}. The POSIX.2 standard makes no
- differentiation between C's _t_e_x_t _s_t_r_e_a_m and _b_i_n_a_r_y _s_t_r_e_a_m.
-
- The formula used in the POSIX.1 {8} definition of _s_e_c_o_n_d_s _s_i_n_c_e _t_h_e _E_p_o_c_h 1
- is not perfect in all cases. See the related rationale in POSIX.1 {8}. 1
-
- END_RATIONALE 1
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 56 2 Terminology and General Requirements
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- 2.2.3 Abbreviations
-
- For the purposes of this standard, the following abbreviations apply:
-
-
- 2.2.3.1 C Standard: ISO/IEC 9899: ..., _I_n_f_o_r_m_a_t_i_o_n _p_r_o_c_e_s_s_i_n_g _s_y_s_t_e_m_s-
- -_P_r_o_g_r_a_m_m_i_n_g _l_a_n_g_u_a_g_e_s--_C {7}.
-
- 2.2.3.2 ERE: An Extended Regular Expression, as defined in 2.8.4.
-
-
- 2.2.3.3 LC_*: An abbreviation used to represent all of the environment
- variables named in 2.6 whose names begin with the characters ``LC_''.
-
- 2.2.3.4 POSIX.1: ISO/IEC 9945-1: 1990: _I_n_f_o_r_m_a_t_i_o_n _t_e_c_h_n_o_l_o_g_y--
- _P_o_r_t_a_b_l_e _O_p_e_r_a_t_i_n_g _S_y_s_t_e_m _I_n_t_e_r_f_a_c_e (_P_O_S_I_X)--_P_a_r_t _1: _S_y_s_t_e_m _A_p_p_l_i_c_a_t_i_o_n
- _P_r_o_g_r_a_m _I_n_t_e_r_f_a_c_e (_A_P_I) [_C _L_a_n_g_u_a_g_e] {8}.
-
-
- 2.2.3.5 POSIX.2: This standard.
-
- 2.2.3.6 RE [BRE]: A Basic Regular Expression, as defined in 2.8.3.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 2.2 Definitions 57
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- 2.3 Built-in Utilities
-
- Any of the standard utilities may be implemented as _r_e_g_u_l_a_r _b_u_i_l_t-_i_n
- utilities within the command language interpreter. This is usually done
- to increase the performance of frequently-used utilities or to achieve
- functionality that would be more difficult in a separate environment.
- The utilities named in Table 2-2 are frequently provided in built-in
- form. All of the utilities named in the table have special properties in
- terms of command search order within the shell, as described in 3.9.1.1.
-
-
- Table 2-2 - Regular Built-in Utilities
- __________________________________________________________________________________________________________________________________________________
-
- cd false kill true wait
- command getopts read umask
- __________________________________________________________________________________________________________________________________________________
-
-
- However, all of the standard utilities, including the regular built-ins
- in the table, but not the special built-ins described in 3.14, shall be
- implemented in a manner so that they can be accessed via the POSIX.1 {8}
- _e_x_e_c family of functions (if the underlying operating system provides the
- services of such a family to application programs) and can be invoked
- directly by those standard utilities that require it (env, find, nohup,
- xargs).
-
- Since versions shall be provided for all utilities except for those
- listed previously, an application running on a system that conforms to
- both POSIX.1 {8} and Section 7 of this standard can use the _e_x_e_c family
- of functions, in addition to the shell command interface in 7.1 [such as
- the _s_y_s_t_e_m() and _p_o_p_e_n() functions in the C binding] defined by this
- standard, to execute any of these utilities.
-
- BEGIN_RATIONALE
-
-
- 2.3.1 Built-in Utilities Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f
- _P_1_0_0_3._2)
-
- In earlier drafts, the table of built-ins implied two things to a
- conforming application: these may be built-ins and these need not be
- executable. The second implication has now been removed and all
- utilities can be _e_x_e_c-ed. There is no requirement that these be actually
- built into the shell itself, but many shells will want to do so because
- 3.9.1.1 requires that they be found prior to the PATH search. The shell
- could satisfy its requirements by keeping a list of the names and
- directly accessing the file-system versions regardless of PATH.
- Providing all of the required functionality for those such as cd or read
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 58 2 Terminology and General Requirements
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- would be more difficult.
-
- There were originally three justifications for allowing the omission of
- _e_x_e_c-able versions:
-
- (1) This would require wasting space in the file system, at the
- expense of very small systems. However, it has been pointed out
- that all nine in the table can be provided with nine links to a
- single-line shell script:
-
- $0 "$@"
-
- (2) There is no sense in requiring invocation of utilities like cd
- because they have no value outside the shell environment or
- cannot be useful in a child process. However, counter-examples
- always seemed to be available for even the strangest cases:
-
- find . -type d -exec cd {} ; -exec foo {} ;
- (which invokes foo on accessible directories)
-
- ps ... | sed ... | xargs kill
-
- find . -exec true ; -a ...
- (where true is used for temporary debugging)
-
- (3) It is confusing to have something such as kill that can easily
- be in the file system in the base standard, but requires built-
- in status for the UPE (for the % job control job ID notation).
- It was decided that it was more appropriate to describe the
- required functionality (rather than the implementation) to the
- system implementors and let them decide how to satisfy it.
-
- On the other hand, there were objections raised during balloting that any
- distinction like this between utilities was not useful to applications
- and that the cost to correct it was small. These arguments were
- ultimately the most effective.
-
- There were varying reasons for including utilities in the table of
- built-ins:
-
- cd, getopts, read, umask, wait
- The functionality of these utilities is performed more
- simply within the context of the current process. An
- example can be taken from the usage of the cd utility.
- The purpose of the utility is to change the working
- directory for subsequent operations. The actions of cd
- affect the process in which cd is executed and all
- subsequent child processes of that process. Based on the
- POSIX.1 {8} process model, changes in the process
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 2.3 Built-in Utilities 59
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- environment of a child process have no effect on the
- parent process. If the cd utility were executed from a
- child process, the working directory change would be
- effective only in the child process. Child processes
- initiated subsequent to the child process that executed
- the cd utility would not have a changed working directory
- relative to the parent process.
-
- command This utility was placed in the table primarily to protect
- scripts that are concerned about their PATH being
- manipulated. The ``secure'' shell script example in
- 4.12.10 would not be possible if a PATH change retrieved
- an alien version of command. (An alternative would have
- been to implement getconf as a built-in, but it was felt
- that it carried too many changing configuration strings to
- require in the shell.)
-
- kill Since common extensions to kill (including the planned
- User Portability Extension) provide optional job control
- functionality using shell notation (%1, %2, etc.), some
- implementations would find it extremely difficult to
- provide this outside the shell.
-
- true, false
- These are in the table as a courtesy to programmers who
- wish to use the ``while true'' shell construct without
- protecting true from PATH searches. (It is acknowledged
- that ``while :'' also works, but the idiom with true is
- historically pervasive.)
-
- All utilities, including those in the table, are accessible via the
- functions in 7.1.1 or 7.1.2 [such as _s_y_s_t_e_m() or _p_o_p_e_n()]. There are
- situations where the return functionality of _s_y_s_t_e_m() and _p_o_p_e_n() is not
- desirable. Applications that require the exit status of the invoked
- utility will not be able to use _s_y_s_t_e_m() or _p_o_p_e_n(), since the exit
- status returned is that of the command language interpreter rather than
- that of the invoked utility. The alternative for such applications is
- the use of the _e_x_e_c family. (The text concerning conformance to
- POSIX.1 {8} was included because where _e_x_e_c is not provided in the
- underlying system, there is no way to require that utilities be _e_x_e_c-
- able).
-
- END_RATIONALE
-
-
-
-
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 60 2 Terminology and General Requirements
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- 2.4 Character Set
-
- Conforming implementations shall support one or more coded character
- sets. Each supported coded character set shall include the _p_o_r_t_a_b_l_e
- _c_h_a_r_a_c_t_e_r _s_e_t specified in Table 2-3. The table defines the characters
- in the portable character set and the corresponding symbolic character
- names used to identify each character in a character set description
- file. The names are chosen to correspond closely with character names
- defined in other international standards. The table contains more than
- one symbolic character name for characters whose traditional name differs
- from the chosen name.
-
- This standard places only the following requirements on the encoded
- values of the characters in the portable character set:
-
- (1) If the encoded values associated with each member of the
- portable character set are not invariant across all locales
- supported by the implementation, the results achieved by an
- application accessing those locales are unspecified.
-
- (2) The encoded values associated with the digits '0' to '9' shall
- be such that the value of each character after '0' shall be one
- greater than the value of the previous character.
-
- (3) A null character, NUL, which has all bits set to zero, shall be
- in the set of characters.
-
- Conforming implementations shall support certain character and character
- set attributes, as defined in 2.5.1.
-
-
- 2.4.1 Character Set Description File
-
- Implementations shall provide a character set description file for at
- least one coded character set supported by the implementation. These
- files are referred to elsewhere in this standard as _c_h_a_r_m_a_p files. It is
- implementation defined whether or not users or applications can provide
- additional character set description files. If such a capability is
- supported, the system documentation shall describe the rules for the
- creation of such files.
-
- Each character set description file shall define characteristics for the
- coded character set and the encoding for the characters specified in
- Table 2-3, and may define encoding for additional characters supported by
- the implementation. Other information about the coded character set may
- also be in the file. Coded character set character values shall be
- defined using symbolic character names followed by character encoding
- values.
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 2.4 Character Set 61
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
-
- Table 2-3 - Character Set and Symbolic Names
- __________________________________________________________________________________________________________________________________________________
- Symbolic Symbolic Symbolic
- Name Glyph Name Glyph Name Glyph
- _____________________________________________________________________________
-
- <NUL> <colon> : <circumflex> ^
- <alert> <semicolon> ; <circumflex-accent> ^
- <backspace> <less-than-sign> < <underscore> _
- <tab> <equals-sign> = <low-line> _
- <newline> <greater-than-sign> > <grave-accent> `
- <vertical-tab> <question-mark> ? <a> a
- <form-feed> <commercial-at> @ <b> b
- <carriage-return> <A> A <c> c
- <space> <B> B <d> d
- <exclamation-mark> ! <C> C <e> e
- <quotation-mark> " <D> D <f> f
- <number-sign> # <E> E <g> g
- <dollar-sign> $ <F> F <h> h
- <percent-sign> % <G> G <i> i
- <ampersand> & <H> H <j> j
- <apostrophe> ' <I> I <k> k
- <left-parenthesis> ( <J> J <l> l
- <right-parenthesis> ) <K> K <m> m
- <asterisk> * <L> L <n> n
- <plus-sign> + <M> M <o> o
- <comma> , <N> N <p> p
- <hyphen> - <O> O <q> q
- <hyphen-minus> - <P> P <r> r
- <period> . <Q> Q <s> s
- <full-stop> . <R> R <t> t
- <slash> / <S> S <u> u
- <solidus> / <T> T <v> v
- <zero> 0 <U> U <w> w
- <one> 1 <V> V <x> x
- <two> 2 <W> W <y> y
- <three> 3 <X> X <z> z
- <four> 4 <Y> Y <left-brace> {
- <five> 5 <Z> Z <left-curly-bracket> {
- <six> 6 <left-square-bracket> [ <vertical-line> |
- <seven> 7 <backslash> \ <right-brace> }
- <eight> 8 <reverse-solidus> \ <right-curly-bracket> }
- <nine> 9 <right-square-bracket> ] <tilde> ~
- __________________________________________________________________________________________________________________________________________________
-
-
- Each symbolic name specified in Table 2-3 shall be included in the file
- and shall be mapped to a unique encoding value (except for those symbolic 1
- names that are shown with identical glyphs). If the control characters 1
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 62 2 Terminology and General Requirements
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- commonly associated with the symbolic names in Table 2-4 are supported by
- the implementation, the symbolic names and their corresponding encoding
- values shall be included in the file. Some of the values associated with 1
- the symbolic names in this table also may be contained in Table 2-3. 1
-
-
- Table 2-4 - Control Character Set
- __________________________________________________________________________________________________________________________________________________
-
- <ACK> <DC2> <ENQ> <FS> <IS4> <SOH> 1
- <BEL> <DC3> <EOT> <GS> <LF> <STX> 1
- <BS> <DC4> <ESC> <HT> <NAK> <SUB> 1
- <CAN> <DEL> <ETB> <IS1> <RS> <SYN> 1
- <CR> <DLE> <ETX> <IS2> <SI> <US> 1
- <DC1> <EM> <FF> <IS3> <SO> <VT> 1
- __________________________________________________________________________________________________________________________________________________
-
-
- The following declarations can precede the character definitions. Each
- shall consist of the symbol shown in the following list, starting in
- column 1, including the surrounding brackets, followed by one of more
- <blank>s, followed by the value to be assigned to the symbol.
-
- <code_set_name> The name of the coded character set for which the
- character set description file is defined. The
- characters of the name shall be taken from the set
- of characters with visible glyphs defined in 1
- Table 2-3. 1
-
- <mb_cur_max> The maximum number of bytes in a multibyte
- character. This shall default to 1.
-
- <mb_cur_min> An unsigned positive integer value that shall
- define the minimum number of bytes in a character
- for the encoded character set. The value shall be
- less than or equal to mb_cur_max. If not
- specified, the minimum number shall be equal to
- mb_cur_max.
-
- <escape_char> The escape character used to indicate that the
- characters following shall be interpreted in a
- special way, as defined later in this subclause.
- This shall default to backslash (\), which is the
- character glyph used in all the following text and
- examples, unless otherwise noted.
-
- <comment_char> The character, that when placed in column 1 of a
- charmap line, is used to indicate that the line
- shall be ignored. The default character shall be
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 2.4 Character Set 63
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- the number-sign (#).
-
- The character set mapping definitions shall be all the lines immediately
- following an identifier line containing the string CHARMAP starting in
- column 1, and preceding a trailer line containing the string END CHARMAP
- starting in column 1. Empty lines and lines containing a comment_char in
- the first column shall be ignored. Each noncomment line of the character
- set mapping definition (i.e., between the CHARMAP and END CHARMAP lines
- of the file) shall be in either of two forms:
-
- "%s %s %s\n", <_s_y_m_b_o_l_i_c-_n_a_m_e>, <_e_n_c_o_d_i_n_g>, <_c_o_m_m_e_n_t_s>
-
- or
-
- "%s...%s %s %s\n", <_s_y_m_b_o_l_i_c-_n_a_m_e>, <_s_y_m_b_o_l_i_c-_n_a_m_e>, <_e_n_c_o_d_i_n_g>,
- <_c_o_m_m_e_n_t_s>
-
- In the first format, the line in the character set mapping definition
- defines a single symbolic name and a corresponding encoding. A symbolic
- name is one or more characters from the set shown with visible glyphs in
- Table 2-3, enclosed between angle brackets. A character following an
- escape character shall be interpreted as itself; for example, the
- sequence ``<\\\>>'' represents the symbolic name ``\>'' enclosed between
- angle brackets.
-
- In the second format, the line in the character set mapping definition
- defines a range of one or more symbolic names. In this form, the
- symbolic names shall consist of zero or more nonnumeric characters from
- the set shown with visible glyphs in Table 2-3, followed by an integer
- formed by one or more decimal digits. The characters preceding the
- integer shall be identical in the two symbolic names, and the integer
- formed by the digits in the second symbolic name shall be equal to or
- greater than the integer formed by the digits in the first name. This
- shall be interpreted as a series of symbolic names formed from the common
- part and each of the integers between the first and the second integer,
- inclusive. As an example, <j0101>...<j0104> is interpreted as the
- symbolic names <j0101>, <j0102>, <j0103>, and <j0104>, in that order.
-
- A character set mapping definition line shall exist for all symbolic
- names specified in Table 2-3, and shall define the coded character value
- that corresponds with the character glyph indicated in the table, or the
- coded character value that corresponds with the control character
- symbolic name. If the control characters commonly associated with the
- symbolic names in Table 2-4 are supported by the implementation, the
- symbolic name and the corresponding encoding value shall be included in
- the file. Additional unique symbolic names may be included. A coded
- character value can be represented by more than one symbolic name.
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 64 2 Terminology and General Requirements
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- The encoding part shall be expressed as one (for single-byte character 1
- values) or more concatenated decimal, octal, or hexadecimal constants in 1
- the following formats:
-
- "%cd%d", <_e_s_c_a_p_e__c_h_a_r>, <_d_e_c_i_m_a_l _b_y_t_e _v_a_l_u_e>
-
- "%cx%x", <_e_s_c_a_p_e__c_h_a_r>, <_h_e_x_a_d_e_c_i_m_a_l _b_y_t_e _v_a_l_u_e>
-
- "%c%o", <_e_s_c_a_p_e__c_h_a_r>, <_o_c_t_a_l _b_y_t_e _v_a_l_u_e>
-
- Decimal constants shall be represented by two or three decimal digits, 2
- preceded by the escape character and the lowercase letter d; for example, 2
- \d05, \d97, or \d143. Hexadecimal constants shall be represented by two 2
- hexadecimal digits, preceded by the escape character and the lowercase 2
- letter x; for example, \x05, \x61, or \x8f. Octal constants shall be 2
- represented by two or three octal digits, preceded by the escape 2
- character; for example, \05, \141, or \217. In a portable charmap file, 2
- each constant shall represent an 8-bit byte. Implementations supporting 2
- other byte sizes may allow constants to represent values larger than 2
- those that can be represented in 8-bit bytes, and to allow additional 2
- digits in constants. When constants are concatenated for multibyte 2
- character values, they shall be of the same type, and interpreted in byte 2
- order from left to right. The manner in which constants are represented 2
- in the character is implementation defined. Omitting bytes from a 2
- multibyte character definition produces undefined results. 2
-
- In lines defining ranges of symbolic names, the encoded value is the
- value for the first symbolic name in the range (the symbolic name
- preceding the ellipsis). Subsequent symbolic names defined by the range
- shall have encoding values in increasing order. For example, the line
-
- <j0101>...<j0104> \d129\d254
-
- shall be interpreted as
-
- <j0101> \d129\d254
- <j0102> \d129\d255
- <j0103> \d130\d0
- <j0104> \d130\d1
-
- The comment is optional.
-
- For the interpretation of the dollar-sign and the number-sign, see
- 2.2.2.37 and 2.2.2.93.
-
- BEGIN_RATIONALE
-
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 2.4 Character Set 65
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- 2.4.2 Character Set Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2)
-
- The portable character set is listed in full so there is no dependency on
- the ISO/IEC 646 {1} (or historically ASCII) encoded character set,
- although the set is identical to the characters defined in the
- International Reference Version of ISO/IEC 646 {1}.
-
- This standard poses no requirement that multiple character sets or code
- sets be supported, leaving this as a marketing differentiation for
- implementors. Although multiple _c_h_a_r_m_a_p files are supported, it is the
- responsibility of the implementation to provide the file(s); if only one
- is provided, only that one will be accessible using the localedef
- utility's -f option (although in the case of just one file on the system,
- -f is not useful).
-
- The statement about invariance in code sets for the portable character
- set is worded as it is to avoid precluding implementations where multiple
- incompatible code sets are available (say, ASCII and EBCDIC). The
- standard utilities cannot be expected to produce predictable results if
- they access portable characters that vary on the same implementation.
-
- The character set description file provides:
-
- - the capability to describe character set attributes (such as
- collation order or character classes) independent of character set
- encoding, and using only the characters in the portable character
- set. This makes it possible to create ``generic'' localedef source
- files for all code sets that share the portable character set (such
- as the ISO 8859 family or IBM Extended ASCII).
-
- - standardized symbolic names for all characters in the portable
- character set, making it possible to refer to any such character
- regardless of encoding.
-
- Implementations are free to describe more than one code set in a
- character set description file, as long as only one encoding exists for
- the characters in Table 2-3. For example, if an implementation defines
- ISO 8859-1 {5} as the primary code set, and ISO 8859-2 {6} as an
- alternate set, with each character from the alternate code set preceded
- in data by a shift code, a character set description file could contain a
- complete description of the primary set and those characters from the
- secondary that are not identical, the encoding of the latter including
- the shift code.
-
- Implementations are free to choose their own symbolic names, as long as
- the names identified by this standard are also defined; this provides
- support for already existing ``character names.''
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 66 2 Terminology and General Requirements
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- The names selected for the members of the portable character set follow
- the ISO 8859 {5} and the ISO/IEC 10646 {B11} standards. However, several
- commonly used UNIX system names occur as synonyms in the list:
-
- - The traditional UNIX system names are used for control characters.
-
- - The word ``slash'' is in addition to ``solidus.'' 1
-
- - The word ``backslash'' is in addition to ``reverse-solidus.'' 1
-
- - The word ``hyphen'' in addition to ``hyphen-minus.''
-
- - The word ``period'' in addition to ``full-stop.''
-
- - For the digits, the word ``digit'' is eliminated.
-
- - For letters, the words ``Latin Capital Letter'' and ``Latin Small
- Letter'' are eliminated.
-
- - The words ``left-brace'' and ``right-brace'' in addition to
- ``left-curly-bracket'' and ``right-curly-bracket.''
-
- - The names of the digits are preferred over the numbers, to avoid
- possible confusion between ``0'' and ``O'', and between ``1'' and
- ``l'' (one and the letter ell).
-
- The names for the control characters in Table 2-4 were taken from
- ISO 4873 {4}.
-
- The charmap file was introduced to resolve problems with the portability
- of, especially, localedef sources. This standard assumes that the 1
- portable character set is constant across all locales, but does not 1
- prohibit implementations from supporting two incompatible codings, such 1
- as both ASCII and EBCDIC. Such ``dual-support'' implementations should 1
- have all charmaps and localedef sources encoded using one portable 1
- character set, in effect ``cross-compiling'' for the other environment. 1
- Naturally, charmaps (and localedef sources) are only portable without 1
- transformation between systems using the same encodings for the portable 1
- character set. They can, however, be transformed between two sets using 1
- only a subset of the actual characters (the portable set). However, the 1
- particular coded character set used for an application or an 1
- implementation does not necessarily imply different characteristics or
- collation: on the contrary, these attributes should in many cases be
- identical, regardless of code set. The charmap provides the capability
- to define a common locale definition for multiple code sets (the same
- localedef source can be used for code sets with different extended
- characters; the ability in the charmap to define ``empty'' names allows
- for characters missing in certain code sets).
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 2.4 Character Set 67
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- In addition, several implementors have expressed an interest in using the
- charmap concept to provide the information required for support of
- multiple character sets. Examples of such information is encoding
- mechanism, string parsing rules, default font information, etc. Such
- extensions are not described here.
-
- The <escape_char> declaration was added at the request of the
- international community to ease the creation of portable _c_h_a_r_m_a_p files on
- terminals not implementing the default backslash escape. (This approach
- was adopted because this is a new interface invented by POSIX.2.
- Historical interfaces, such as the shell command language and awk, have
- not been modified to accommodate this type of terminal.) The
- <comment_char> declaration was added at the request of the international
- community to eliminate the potential confusion between the number sign
- and the pound sign.
-
- The octal number notation with no leading zero required was selected to 1
- match those of awk and tr and is consistent with that used by localedef. 1
- To avoid confusion between an octal constant and the backreferences used 1
- in localedef source, the octal, hexadecimal, and decimal constants must 1
- contain at least two digits. As single-digit constants are relatively 1
- rare, this should not impose any significant hardship. Each of the 1
- constants includes ``two or more'' digits to account for systems in which 1
- the byte size is larger than eight bits. For example, a Unicode system 1
- that has defined 16-bit bytes may require six octal, four hexadecimal, 1
- and five decimal digits. 1
-
- The decimal notation is supported because some newer international
- standards define character values in decimal, rather than in the old
- column/row notation.
-
- The charmap identifies the coded character sets supported by an
- implementation. At least one charmap must be provided, but no
- implementation is required to provide more than one. Likewise,
- implementations can allow users to generate new charmaps (for instance
- for a new version of the 8859 family of coded character sets), but does
- not have to do so. If users are allowed to create new charmaps, the
- system documentation must describe the rules that apply (for instance:
- ``only coded character sets that are supersets of ISO/IEC 646 {1} IRV, no
- multibyte characters, etc.'')
-
- END_RATIONALE
-
-
-
-
-
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 68 2 Terminology and General Requirements
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- 2.5 Locale
-
- A _l_o_c_a_l_e is the definition of the subset of a user's environment that
- depends on language and cultural conventions. It is made up from one or
- more categories. Each category is identified by its name and controls
- specific aspects of the behavior of components of the system. Category
- names correspond to the following environment variable names:
-
- LC_CTYPE Character classification and case conversion.
-
- LC_COLLATE Collation order.
-
- LC_TIME Date and time formats.
-
- LC_NUMERIC Numeric, nonmonetary formatting.
-
- LC_MONETARY Monetary formatting.
-
- LC_MESSAGES Formats of informative and diagnostic messages and
- interactive responses.
-
- Conforming implementations shall provide the standard utilities and the 1
- interfaces in Annex B (if that option is supported) with the capability 1
- to modify their behavior based on the current locale, as defined in the 1
- Environment Variables subclause for each utility and interface. 1
-
- Locales other than those supplied by the implementation can be created
- via the localedef utility (see 4.35), provided that the
- {POSIX2_LOCALEDEF} symbol is defined on the system; see 2.13.2.
- Otherwise, only the implementation-provided locale(s) can be used. The
- input to the utility is described in 2.5.2. The value that shall be used
- to specify a locale when using environment variables shall be the string
- specified as the _n_a_m_e operand to the localedef utility when the locale
- was created. The strings "C" and "POSIX" are reserved as identifiers for
- the POSIX Locale (see 2.5.1.) When the value of a locale environment
- variable begins with a slash (/), it shall be interpreted as the pathname
- of the locale definition. If the value of the locale value does not
- begin with a slash, the mechanism used to locate the locale is
- implementation defined.
-
- If different character sets are used by the locale categories, the
- results achieved by an application utilizing these categories is
- undefined. Likewise, if different code sets are used for the data being
- processed by interfaces whose behavior is dependent on the current
- locale, or the code set is different from the code set assumed when the
- locale was created, the result is also undefined.
-
- BEGIN_RATIONALE
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 2.5 Locale 69
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- 2.5.0.1 Locale Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2)
-
- The description of locales is based on work performed in the UniForum
- Technical Committee Subcommittee on Internationalization. Wherever
- appropriate, keywords were taken from the C Standard {7} or the _X/_O_p_e_n
- _P_o_r_t_a_b_i_l_i_t_y _G_u_i_d_e {B31}.
-
- The value that shall be used to specify a locale when using environment
- variables is the name specified as the _n_a_m_e operand to the localedef
- utility when the locale was created. This provides a verifiable method
- to create and invoke a locale.
-
- The ``object'' definitions need not be portable, as long as ``source''
- definitions are. Strictly speaking, ``source'' definitions are portable
- only between implementations using the same character set(s). Such
- ``source'' definitions can, if they use symbolic names only, easily be
- ported between systems using different code sets as long as the
- characters in the portable character set (Table 2-3) have common values
- between the code sets; this is frequently the case in historical
- implementations. Of course, this requires that the symbolic names used
- for characters outside the portable character set are identical between
- character sets. The definition of symbolic names for characters is
- outside the scope of this standard, but is certainly within the scope of
- other standards organizations. When such names are standardized, future
- versions of POSIX.2 should require the use of these names.
-
- Applications can select the desired locale by invoking the _s_e_t_l_o_c_a_l_e()
- function (or equivalent) with the appropriate value. If the function is
- invoked with an empty string, the value of the corresponding environment
- variable is used. If the environment variable is unset or is set to the
- empty string, the implementation sets the appropriate environment as
- defined in 2.6.
-
- END_RATIONALE
-
-
- 2.5.1 POSIX Locale
-
- Conforming implementations shall provide a _P_O_S_I_X _L_o_c_a_l_e. The behavior of
- standard utilities in the POSIX Locale shall be as if the locale was
- defined via the localedef utility with input data from Table 2-5,
- Table 2-7, Table 2-9, Table 2-10, Table 2-8, and Table 2-11, all in
- 2.5.2.
-
- The tables describe the characteristics and behavior of the POSIX Locale
- for data consisting entirely of characters from the portable character
- set in Table 2-3 and the control characters in Table 2-4. For characters
- other than those in the two tables, the behavior is unspecified.
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 70 2 Terminology and General Requirements
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- The POSIX Locale can be specified by assigning the appropriate
- environment variables the values "C" or "POSIX".
-
- Table 2-5 shows the definition for the LC_CTYPE category.
-
- Table 2-7 shows the definition for the LC_COLLATE category.
-
- Table 2-8 shows the definition for the LC_MONETARY category.
-
- Table 2-9 shows the definition for the LC_NUMERIC category.
-
- Table 2-10 shows the definition for the LC_TIME category.
-
- Table 2-11 shows the definition for the LC_MESSAGES category.
-
- BEGIN_RATIONALE
-
-
- 2.5.1.1 POSIX Locale Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f
- _P_1_0_0_3._2)
-
- The POSIX Locale is equal to the "C" locale, as specified in POSIX.1 {8}.
- To avoid being classified as a C-language function, the name has been
- changed to the _P_O_S_I_X _L_o_c_a_l_e; the environment variable value can be either
- "POSIX", or, for historical reasons, "C".
-
- The POSIX definitions mirror the historical UNIX system behavior.
-
- The use of symbolic names for characters in the tables does not imply
- that the POSIX Locale must be described using symbolic character names,
- but merely that it may be advantageous to do so.
-
- Implementations must define a locale as the ``default'' locale, to be
- invoked when no environment variables are set, or set to the empty
- string. This default locale can be the POSIX Locale or any other,
- implementation-defined locale. Some implementations may provide
- facilities for local installation administrators to set the default
- locale, customizing it for each location. This standard does not require
- such a facility. 1
-
- END_RATIONALE 1
-
-
- 2.5.2 Locale Definition
-
- The capability to specify additional locales to those provided by an
- implementation is optional (see 2.13.2). If the option is not supported,
- only implementation-supplied locales are available. Such locales shall
- be documented using the format specified in this clause.
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 2.5 Locale 71
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- Locales can be described with the file format presented in this
- subclause. The file format is that accepted by the localedef utility
- (see 4.35). For the purposes of this subclause, the file is referred to
- as the _l_o_c_a_l_e _d_e_f_i_n_i_t_i_o_n _f_i_l_e, but no locales shall be affected by this
- file unless it is processed by localedef or some similar mechanism. Any 1
- requirements in this subclause imposed upon ``the utility'' shall apply 1
- to localedef or to any other similar utility used to install locale 1
- information using the locale definition file format described here. 1
-
- The locale definition file shall contain one or more locale category
- source definitions, and shall not contain more than one definition for
- the same locale category. If the file contains source definitions for
- more than one category, implementation-defined categories, if present,
- shall appear after the categories defined by this clause (2.5). A
- category source definition shall contain either the definition of a
- category or a copy directive. For a description of the copy directive,
- see 4.35. In the event that some of the information for a locale
- category, as specified in this standard, is missing from the locale
- source definition, the behavior of that category, if it is referenced, is
- unspecified.
-
- A category source definition shall consist of a category header, a
- category body, and a category trailer. A category header shall consist
- of the character string naming of the category, beginning with the
- characters LC_. The category trailer shall consist of the string END, 1
- followed by one or more <blank>s and the string used in the corresponding 1
- category header.
-
- The category body shall consist of one or more lines of text. Each line
- shall contain an identifier, optionally followed by one or more operands.
- Identifiers shall be either keywords, identifying a particular locale
- element, or collating elements. In addition to the keywords defined in
- this standard, the source can contain implementation-defined keywords.
- Each keyword within a locale shall have a unique name (i.e., two
- categories cannot have a commonly-named keyword); no keyword shall start
- with the characters LC_. Identifiers shall be separated from the
- operands by one or more <blank>s.
-
- Operands shall be characters, collating elements, or strings of
- characters. Strings shall be enclosed in double-quotes. Literal 1
- double-quotes within strings shall be preceded by the <_e_s_c_a_p_e _c_h_a_r_a_c_t_e_r>, 1
- described below. When a keyword is followed by more than one operand, 1
- the operands shall be separated by semicolons; <blank>s shall be allowed
- before and/or after a semicolon.
-
- The first category header in the file can be preceded by a line modifying
- the comment character. It shall have the following format, starting in
- column 1:
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 72 2 Terminology and General Requirements
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- "comment_char %c\n", <_c_o_m_m_e_n_t _c_h_a_r_a_c_t_e_r>
-
- The comment character shall default to the number-sign (#). Blank lines
- and lines containing the <_c_o_m_m_e_n_t _c_h_a_r> in the first position shall be
- ignored.
-
- The first category header in the file can be preceded by a line modifying
- the escape character to be used in the file. It shall have the following
- format, starting in column 1:
-
- "escape_char %c\n", <_e_s_c_a_p_e _c_h_a_r_a_c_t_e_r>
-
- The escape character shall default to backslash, which is the character
- used in all examples shown in this standard.
-
- A line can be continued by placing an escape character as the last
- character on the line; this continuation character shall be discarded 1
- from the input. Although the implementation need not accept any one 1
- portion of a continued line with a length exceeding {LINE_MAX} bytes, it 1
- shall place no limits on the accumulated length of the continued line. 1
- Comment lines shall not be continued on a subsequent line using an 1
- escaped <newline>.
-
- Individual characters, characters in strings, and collating elements 2
- shall be represented using symbolic names, as defined below. In 2
- addition, characters can be represented using the characters themselves, 2
- or as octal, hexadecimal, or decimal constants. When nonsymbolic 2
- notation is used, the resultant locale definitions need not be portable 2
- between systems. The left angle bracket (<) is a reserved symbol, 2
- denoting the start of a symbolic name; when used to represent itself it 2
- shall be preceded by the escape character. The following rules apply to 2
- character representation: 2
-
- (1) A character can be represented via a symbolic name, enclosed 2
- within angle brackets (< and >). The symbolic name, including 2
- the angle brackets, shall exactly match a symbolic name defined 2
- in the charmap file specified via the localedef -f option, and 2
- shall be replaced by a character value determined from the value 2
- associated with the symbolic name in the charmap file. The use 2
- of a symbolic name not found in the _c_h_a_r_m_a_p file shall 1
- constitute an error, unless the category is LC_CTYPE or
- LC_COLLATE, in which case it shall constitute a warning
- condition (see localedef in 4.35 for a description of action
- resulting from errors and warnings). The specification of a
- symbolic name in a collating-element or collating-symbol clause
- that duplicates a symbolic name in the charmap file (if present)
- is an error. Use of the escape character or a right angle
- bracket within a symbolic name shall be invalid unless the
- character is preceded by the escape character.
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 2.5 Locale 73
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- _E_x_a_m_p_l_e: <c>;<c-cedilla> "<M><a><y>"
-
- (2) A character can be represented by the character itself, in which 2
- case the value of the character is implementation defined. 2
- Within a string, the double-quote character, the escape 2
- character, and the right angle bracket character shall be 2
- escaped (preceded by the escape character) to be interpreted as 2
- the character itself. Outside strings, the characters 2
-
- , ; < > _e_s_c_a_p_e__c_h_a_r 2
-
- shall be escaped to be interpreted as the character itself. 2
-
- _E_x_a_m_p_l_e: c B "May"
-
- (3) A character can be represented as an octal constant. An octal 2
- constant shall be specified as the escape character followed by 1
- two or more octal digits. Each constant shall represent a byte 1
- value. Multibyte characters can be represented by concatenated
- constants.
-
- _E_x_a_m_p_l_e: \143;\347;\143\150 "\115\141\171"
-
- (4) A character can be represented as a hexadecimal constant. A 2
- hexadecimal constant shall be specified as the escape character 2
- followed by an x followed by two or more hexadecimal digits. 1
- Each constant shall represent a byte value. Multibyte
- characters can be represented by concatenated constants.
-
- _E_x_a_m_p_l_e: \x63;\xe7;\x63\x68 "\x4d\x61\x79"
-
- (5) A character can be represented as a decimal constant. A decimal 2
- constant shall be specified as the escape character followed by 2
- a d followed by two or more decimal digits. Each constant shall 1
- represent a byte value. Multibyte values can be represented by
- concatenated constants.
-
- _E_x_a_m_p_l_e: \d99;\d231;\d99\d104 "\d77\d97\d121"
-
- Implementations may accept single-digit octal, decimal, or hexadecimal 1
- constants following the escape character. Only characters existing in 1
- the character set for which the locale definition is created shall be 1
- specified, whether using symbolic names, the characters themselves, or 1
- octal, decimal, or hexadecimal constants. If a charmap file is present, 2
- only characters defined in the charmap can be specified using octal, 2
- decimal, or hexadecimal constants. Symbolic names not present in the 2
- charmap file can be specified and shall be ignored, as specified under 2
- item (1) above. 2
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 74 2 Terminology and General Requirements
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- BEGIN_RATIONALE 2
-
- 2.5.2.0.1 Locale Definition Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f
- _P_1_0_0_3._2)
-
- The decision to separate the file format from the localedef utility 1
- description was only partially editorial. Implementations may provide 1
- other interfaces than localedef. Requirements on ``the utility,'' mostly 1
- concerning error messages, are described in this way because they are 1
- meant to affect the other interfaces implementations may provide as well 1
- as localedef. (This is similar to the philosophy used by POSIX.1 {8} 1
- where the descriptions of the tar and cpio file formats impose 1
- requirements on any utilities processing them.) 1
-
- The text about {POSIX2_LOCALEDEF} does not mean that internationalization
- is optional; only that the functionality of the localedef utility is.
- Regular expressions, for instance, must still be able to recognize e.g.,
- character class expressions such as [[:alpha:]].
-
- A possible analogy is with an applications development environment:
- while all conforming implementations must be capable of executing
- applications, not all need to have the development environment installed.
- The assumption is that the capability to modify the behavior of utilities
- (and applications) via locale settings must be supported. If the
- localedef utility is not present, then the only choice is to select an
- existing (presumably implementation-documented) locale. An
- implementation could, for example, chose to support only the POSIX
- Locale, which would in effect limit the amount of changes from historical
- implementations quite drastically. The localedef utility is still
- required, but would always terminate with an exit code indicating that no
- locale could be created. Supported locales must be documented using the
- syntax defined in 2.5. (This ensures that users can accurately determine
- what capabilities are provided. If the implementation decides to provide
- additional capabilities to the ones in 2.5, that is already provided
- for.)
-
- If the option is present (i.e., locales can be created), then the
- localedef utility must be capable of creating locales based on the syntax
- and rules defined in 2.5. This does not mean that the implementation
- cannot also provide alternate means for creating locales.
-
- The octal, decimal, and hexadecimal notations are the same employed by 1
- the charmap facility (see 2.4.1). To avoid confusion between an octal 1
- constant and a backreference, the octal, hexadecimal, and decimal 1
- constants must contain at least two digits. As single-digit constants 1
- are relatively rare, this should not impose any significant hardship. 1
- Each of the constants includes ``two or more'' digits to account for 1
- systems in which the byte size is larger than eight bits. For example, a 1
- Unicode system that has defined 16-bit bytes may require six octal, four 1
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 2.5 Locale 75
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- hexadecimal, and five decimal digits. 1
-
- This standard is intended as an international (ISO/IEC) standard as well 1
- as an IEEE standard, and must therefore follow the ISO/IEC guidelines. 1
- One such rule is that characters outside the invariant part of 1
- ISO/IEC 646 {1} should not be used in portable specifications. The 1
- backslash character is not in the invariant part; the number-sign is, but 1
- with multiple representations: as a number-sign and as a pound sign. As 1
- far as general usage of these symbols, they are covered by the 1
- ``grandfather clause,'' but for newly defined interfaces, ISO has 1
- requested that POSIX provides alternate representations. Consequently, 1
- while the default escape character remains the backslash, and the default 1
- comment character is the number-sign, implementations are required to 1
- recognize alternative representations, identified in the applicable 1
- source file via the escape_char and comment_char keywords. 1
-
- END_RATIONALE 1
-
-
- 2.5.2.1 LC_CTYPE
-
- Table 2-5 - LC_CTYPE Category Definition in the POSIX Locale
- __________________________________________________________________________________________________________________________________________________
- LC_CTYPE
- # The following is the POSIX Locale LC_CTYPE.
- # "alpha" is by default "upper" and "lower"
- # "alnum" is by definition "alpha" and "digit"
- # "print" is by default "alnum", "punct" and the <space> character
- # "graph" is by default "alnum" and "punct"
-
- #
- upper <A>;<B>;<C>;<D>;<E>;<F>;<G>;<H>;<I>;<J>;<K>;<L>;<M>;\
- <N>;<O>;<P>;<Q>;<R>;<S>;<T>;<U>;<V>;<W>;<X>;<Y>;<Z>
- #
- lower <a>;<b>;<c>;<d>;<e>;<f>;<g>;<h>;<i>;<j>;<k>;<l>;<m>;\
- <n>;<o>;<p>;<q>;<r>;<s>;<t>;<u>;<v>;<w>;<x>;<y>;<z>
- #
- digit <zero>;<one>;<two>;<three>;<four>;<five>;<six>;<seven>;<eight>;<nine>
- #
- space <tab>;<newline>;<vertical-tab>;<form-feed>;<carriage-return>;<space>
- #
- cntrl <alert>;<backspace>;<tab>;<newline>;<vertical-tab>;\
- <form-feed>;<carriage-return>;\
- <NUL>;<SOH>;<STX>;<ETX>;<EOT>;<ENQ>;<ACK>;<SO>;\
- <SI>;<DLE>;<DC1>;<DC2>;<DC3>;<DC4>;<NAK>;<SYN>;\
- <ETB>;<CAN>;<EM>;<SUB>;<ESC>;<IS4>;<IS3>;<IS2>;\
- <IS1>;<DEL>
- #
- punct <exclamation-mark>;<quotation-mark>;<number-sign>;\
- <dollar-sign>;<percent-sign>;<ampersand>;<apostrophe>;\
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 76 2 Terminology and General Requirements
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- <left-parenthesis>;<right-parenthesis>;<asterisk>;\
- <plus-sign>;<comma>;<hyphen>;<period>;<slash>;\
- <colon>;<semicolon>;<less-than-sign>;<equals-sign>;\
- <greater-than-sign>;<question-mark>;<commercial-at>
- <left-square-bracket>;<backslash>;<right-square-bracket>;\
- <circumflex>;<underline>;<grave-accent>;\
- <left-curly-bracket>;<vertical-line>;<right-curly-bracket>;<tilde>
- #
- xdigit <zero>;<one>;<two>;<three>;<four>;<five>;<six>;<seven>;<eight>;\
- <nine>;<A>;<B>;<C>;<D>;<E>;<F>;<a>;<b>;<c>;<d>;<e>;<f>
- #
- blank <space>;<tab>
- #
- toupper (<a>,<A>);(<b>,<B>);(<c>,<C>);(<d>,<D>);(<e>,<E>);\
- (<f>,<F>);(<g>,<G>);(<h>,<H>);(<i>,<I>);(<j>,<J>);\
- (<k>,<K>);(<l>,<L>);(<m>,<M>);(<n>,<N>);(<o>,<O>);\
- (<p>,<P>);(<q>,<Q>);(<r>,<R>);(<s>,<S>);(<t>,<T>);\
- (<u>,<U>);(<v>,<V>);(<w>,<W>);(<x>,<X>);(<y>,<Y>);(<z>,<Z>)
- #
- tolower (<A>,<a>);(<B>,<b>);(<C>,<c>);(<D>,<d>);(<E>,<e>);\
- (<F>,<f>);(<G>,<g>);(<H>,<h>);(<I>,<i>);(<J>,<j>);\
- (<K>,<k>);(<L>,<l>);(<M>,<m>);(<N>,<n>);(<O>,<o>);\
- (<P>,<p>);(<Q>,<q>);(<R>,<r>);(<S>,<s>);(<T>,<t>);\
- (<U>,<u>);(<V>,<v>);(<W>,<w>);(<X>,<x>);(<Y>,<y>);(<Z>,<z>)
- END LC_CTYPE
- __________________________________________________________________________________________________________________________________________________
-
- The LC_CTYPE category shall define character classification, case
- conversion, and other character attributes. In addition, a series of
- characters can be represented by three adjacent periods representing an 1
- ellipsis symbol (``...''). The ellipsis specification shall be 1
- interpreted as meaning that all values between the values preceding and 1
- following it represent valid characters. The ellipsis specification only 1
- shall be valid within a single encoded character set. An ellipsis shall
- be interpreted as including in the list all characters with an encoded
- value higher than the encoded value of the character preceding the
- ellipsis and lower than the encoded value of the character following the
- ellipsis.
-
- _E_x_a_m_p_l_e: \x30;...;\x39; includes in the character class all characters
- with encoded values between the endpoints.
-
- The following keywords shall be recognized. In the descriptions, the
- term ``automatically included'' means that it shall not be an error to
- either include the referenced characters or to omit them; the
- implementation shall provide them if missing and accept them silently if
- present.
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 2.5 Locale 77
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- copy Specify the name of an existing locale to be used as the
- source for the definition of this category. If this
- keyword is specified, no other keyword shall be
- specified.
-
- upper Define characters to be classified as uppercase letters.
- No character specified for the keywords cntrl, digit,
- punct, or space shall be specified. If this keyword is 2
- not specified, the uppercase letters A through Z, as 2
- defined in Table 2-3 (see 2.4.1), shall automatically 2
- belong to this class, with implementation-defined 2
- character values. 2
-
- lower Define characters to be classified as lowercase letters.
- No character specified for the keywords cntrl, digit,
- punct, or space shall be specified. If this keyword is 2
- not specified, the lowercase letters a through z, as 2
- defined in Table 2-3 (see 2.4.1), shall automatically 2
- belong to this class, with implementation-defined 2
- character values. 2
-
- alpha Define characters to be classified as letters. No
- character specified for the keywords cntrl, digit, punct,
- or space shall be specified. In addition, characters
- classified as either upper or lower shall automatically
- belong to this class.
-
- digit Define the characters to be classified as numeric digits. 2
- Only the digits 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9 shall be 2
- specified, and in ascending sequence by numerical value. 2
- If this keyword is not specified, the digits 0 through 9, 2
- as defined in Table 2-3 (see 2.4.1), shall automatically 2
- belong to this class, with implementation-defined 2
- character values. 2
-
- space Define characters to be classified as white-space
- characters. No character specified for the keywords
- upper, lower, alpha, digit, graph, or xdigit shall be 1
- specified. If this keyword is not specified, the 2
- characters <space>, <form-feed>, <newline>, <carriage- 2
- return>, <tab>, and <vertical-tab>, as defined in 2
- Table 2-3 (see 2.4.1), shall automatically belong to this 2
- class, with implementation-defined character values. Any 2
- characters included in the class blank shall be 1
- automatically included. 1
-
- cntrl Define characters to be classified as control characters.
- No character specified for the keywords upper, lower,
- alpha, digit, punct, graph, print, or xdigit shall be 1
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 78 2 Terminology and General Requirements
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- specified. 1
-
- punct Define characters to be classified as punctuation
- characters. No character specified for the keywords
- upper, lower, alpha, digit, cntrl, xdigit, or as the
- <space> character shall be specified.
-
- graph Define characters to be classified as printable
- characters, not including the <space> character. If this
- keyword is not specified, characters specified for the
- keywords upper, lower, alpha, digit, xdigit, and punct
- shall belong to this character class. No character
- specified for the keyword cntrl shall be specified.
-
- print Define characters to be classified as printable
- characters, including the <space> character. If this
- keyword is not provided, characters specified for the
- keywords upper, lower, alpha, digit, xdigit, punct, and
- the <space> character shall belong to this character
- class. No character specified for the keyword cntrl
- shall be specified.
-
- xdigit Define the characters to be classified as hexadecimal
- digits. Only the characters defined for the class digit 2
- shall be specified, in ascending sequence by numerical 2
- value, followed by one or more sets of six characters 2
- representing the hexadecimal digits 10 through 15, with 2
- each set in ascending order (for example A, B, C, D, E, 2
- F, a, b, c, d, e, f). If this keyword is not specified, 2
- the digits 0 through 9, the uppercase letters A through 2
- F, and the lowercase letters a through f, as defined in 2
- Table 2-3 (see 2.4.1), shall automatically belong to this 2
- class, with implementation-defined character values. 2
-
- blank Define characters to be classified as <blank> characters.
- If this keyword is unspecified, the characters <space>
- and <tab> shall belong to this character class.
-
- toupper Define the mapping of lowercase letters to uppercase
- letters. The operand shall consist of character pairs,
- separated by semicolons. The characters in each
- character pair shall be separated by a comma and the pair
- enclosed by parentheses. The first character in each
- pair shall be the lowercase letter, the second the
- corresponding uppercase letter. Only characters
- specified for the keywords lower and upper shall be
- specified. If this keyword is not specified, the 2
- lowercase letters a through z, and their corresponding 2
- uppercase letters A through Z, as defined in Table 2-3 2
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 2.5 Locale 79
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- (see 2.4.1), shall automatically be included, with 2
- implementation-defined character values. 2
-
- tolower Define the mapping of uppercase letters to lowercase
- letters. The operand shall consist of character pairs,
- separated by semicolons. The characters in each
- character pair are separated by a comma and the pair
- enclosed by parentheses. The first character in each
- pair shall be the uppercase letter, the second the
- corresponding lowercase letter. Only characters
- specified for the keywords lower and upper shall be
- specified.
-
- The tolower keyword is optional. If specified, the
- uppercase letters A through Z, as defined in Table 2-3,
- and their corresponding lowercase letter, shall be
- specified. If this keyword is not specified, the mapping
- shall be the reverse mapping of the one specified for
- toupper.
-
- Table 2-6 shows the allowed character class combinations.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 80 2 Terminology and General Requirements
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
-
- Table 2-6 - Valid Character Class Combinations
- __________________________________________________________________________________________________________________________________________________
- _____________________________________________________________________________
- | In |_________________________C_a_n__A_l_s_o__B_e_l_o_n_g__T_o__________________________|
- |Class | upper lower alpha digit space cntrl punct graph print xdigit blank |
- _|________|____________________________________________________________________|
- |upper | - - M X X X X D D - X |
- |lower | - - M X X X X D D - X |
- |alpha | - - - X X X X D D - X |
- |digit | X X X - X X X D D - X |
- |space | X X X X - - * * * X - 2|
- |cntrl | X X X X - - X X X X - 2|
- |punct | X X X X - X - D D X - |
- |graph | - - - - - X - - - - - |
- |print | - - - - - X - - - - - |
- |xdigit | - - - - X X X D D - X |
- _||b_l_a_n_k____||___X______X______X______X______M______-______*______*______*______X_______-___2_||
-
- NOTES:
-
- (1) Explanation of codes:
-
- M Always
-
- D Default; belongs to class if not specified
-
- - Permitted
-
- X Mutually exclusive
-
- * See note (2)
-
- (2) The <space> character, which is part of the space and blank
- classes, cannot belong to punct or graph, but automatically
- shall belong to the print class. Other space or blank
- characters can be classified as punct, graph, and/or print.
-
- __________________________________________________________________________________________________________________________________________________
-
-
- BEGIN_RATIONALE
-
- 2.5.2.1.1 LC_CTYPE Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2)
-
- The LC_CTYPE category primarily is used to define the encoding-
- independent aspects of a character set, such as character classification.
- In addition, certain encoding-dependent characteristics are also defined
- for an application via the LC_CTYPE category. POSIX.2 does not mandate
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 2.5 Locale 81
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- that the encoding used in the locale is the same as the one used by the
- application, because an implementation may decide that it is advantageous
- to define locales in a system-wide encoding rather than having multiple,
- logically identical locales in different encodings, and to convert from
- the application encoding to the system-wide encoding on usage. Other
- implementations could require encoding-dependent locales.
-
- In either case, the LC_CTYPE attributes that are directly dependent on
- the encoding, such as mb_cur_max and the display width of characters, are
- not user-specifiable in a locale source, and are consequently not defined
- as keywords.
-
- As the LC_CTYPE character classes are based on the C Standard {7}
- character-class definition, the category does not support multicharacter
- elements. For instance, the German character <sharp-s> is traditionally
- classified as a lowercase letter. There is no corresponding uppercase
- letter; in proper capitalization of German text the <sharp-s> will be
- replaced by SS; i.e., by two characters. This kind of conversion is
- outside the scope of the toupper and tolower keywords.
-
- Where POSIX.2 specifies that only certain characters can be specified, as 1
- for the keywords digit and xdigit, the specified characters must be from 1
- the portable character set, as shown. As an example, only the Arabic 1
- digits 0 through 9 are acceptable as digits. 1
-
- The character classes digit, xdigit, lower, upper, and space have a set 2
- of automatically included characters. These only need to be specified if 2
- the character values (i.e., encoding) differs from the implementation 2
- default values. 2
-
- The definition of character class digit requires that only ten 2
- characters--the ones defining digits--can be specified; alternate digits 2
- (e.g., Hindi or Kanji) cannot be specified here. However, the encoding 2
- may vary if an implementation supports more than one encoding. 2
-
- The definition of character class xdigit requires that the characters 2
- included in character class digit are included here also, and allows for 2
- different symbols for the hexadecimal digits 10 through 15. 2
-
- END_RATIONALE 2
-
-
- 2.5.2.2 LC_COLLATE
-
- A collation sequence definition shall define the relative order between
- collating elements (characters and multicharacter collating elements) in
- the locale. This order is expressed in terms of collation values; i.e.,
- by assigning each element one or more collation values (also known as
- collation weights). This does not imply that implementations shall
- assign such values, but that ordering of strings using the resultant
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 82 2 Terminology and General Requirements
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- collation definition in the locale shall behave as if such assignment is
- done and used in the collation process. The collation sequence
- definition shall be used by regular expressions, pattern matching, and
- sorting. The following capabilities are provided:
-
- (1) Multicharacter collating elements. Specification of
- multicharacter collating elements (i.e., sequences of two or
- more characters to be collated as an entity).
-
- (2) User-defined ordering of collating elements. Each collating
- element shall be assigned a collation value defining its order
- in the character (or basic) collation sequence. This ordering
- is used by regular expressions and pattern matching and, unless
- collation weights are explicitly specified, also as the
- collation weight to be used in sorting.
-
- (3) Multiple weights and equivalence classes. Collating elements
- can be assigned one or more (up to the limit {COLL_WEIGHTS_MAX})
- collating weights for use in sorting. The first weight is
- hereafter referred to as the primary weight.
-
- (4) One-to-Many mapping. A single character is mapped into a string
- of collating elements.
-
- (5) Many-to-Many substitution. A string of one or more characters
- is substituted by another string (or an empty string, i.e., the
- character or characters shall be ignored for collation
- purposes).
-
- (6) Equivalence class definition. Two or more collating elements
- have the same collation value (primary weight).
-
- (7) Ordering by weights. When two strings are compared to determine 2
- their relative order, the two strings are first broken up into a 2
- series of collating elements, and each successive pair of 2
- elements are compared according to the relative primary weights 2
- for the elements. If equal, and more than one weight has been 2
- assigned, then the pairs of collating elements are recompared 2
- according to the relative subsequent weights, until either a 2
- pair of collating elements compare unequal or the weights are 2
- exhausted. 2
-
- The following keywords shall be recognized in a collation sequence
- definition. They are described in detail in the following subclauses.
-
- copy Specify the name of an existing locale to be
- used as the source for the definition of this
- category. If this keyword is specified, no
- other keyword shall be specified.
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 2.5 Locale 83
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- collating-element Define a collating-element symbol representing a 1
- multicharacter collating element. This keyword 1
- is optional.
-
- collating-symbol Define a collating symbol for use in collation 1
- order statements. This keyword is optional. 1
-
- 2
-
- order_start Define collation rules. This statement is
- followed by one or more collation order
- statements, assigning character collation values
- and collation weights to collating elements.
-
- order_end Specify the end of the collation-order 1
- statements. 1
-
- Table 2-7 - LC_COLLATE Category Definition in the POSIX Locale
- __________________________________________________________________________________________________________________________________________________
- LC_COLLATE
- # This is the POSIX Locale definition for the LC_COLLATE category.
- # The order is the same as in the ASCII code set.
- order_start forward
- <NUL>
- <SOH>
- <STX>
- <ETX>
- <EOT>
- <ENQ>
- <ACK>
- <alert>
- <backspace>
- <tab>
- <newline>
- <vertical-tab>
- <form-feed>
- <carriage-return>
- <SO>
- <SI>
- <DLE>
- <DC1>
- <DC2>
- <DC3>
- <DC4>
- <NAK>
- <SYN>
- <ETB>
- <CAN>
- <EM>
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 84 2 Terminology and General Requirements
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- <SUB>
- <ESC>
- <IS4>
- <IS3>
- <IS2>
- <IS1>
- <space>
- <exclamation-mark>
- <quotation-mark>
- <number-sign>
- <dollar-sign>
- <percent-sign>
- <ampersand>
- <apostrophe>
- <left-parenthesis>
- <right-parenthesis>
- <asterisk>
- _________________________________________________________________________
-
- Table 2-7 - LC_COLLATE Category Definition in the POSIX Locale (_c_o_n_t_i_n_u_e_d)
- _________________________________________________________________________
- <plus-sign>
- <comma>
- <hyphen>
- <period>
- <slash>
- <zero>
- <one>
- <two>
- <three>
- <four>
- <five>
- <six>
- <seven>
- <eight>
- <nine>
- <colon>
- <semicolon>
- <less-than-sign>
- <equals-sign>
- <greater-than-sign>
- <question-mark>
- <commercial-at>
- <A>
- <B>
- <C>
- <D>
- <E>
- <F>
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 2.5 Locale 85
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- <G>
- <H>
- <I>
- <J>
- <K>
- <L>
- <M>
- <N>
- <O>
- <P>
- <Q>
- <R>
- <S>
- <T>
- <U>
- <V>
- <W>
- <X>
- <Y>
- <Z>
- _________________________________________________________________________
-
- 2.5.2.2.1 collating-element Keyword
-
- In addition to the collating elements in the character set, the
- collating-element keyword shall be used to define multicharacter
- collating elements. The syntax is
-
- "collating-element %s from %s\n", <_c_o_l_l_a_t_i_n_g-_s_y_m_b_o_l>, <_s_t_r_i_n_g>
-
- The <_c_o_l_l_a_t_i_n_g-_s_y_m_b_o_l> operand shall be a symbolic name, enclosed between 1
- angle brackets (< and >), and shall not duplicate any symbolic name in
- the current charmap file (if any), or any other symbolic name defined in
- this collation definition. The string operand shall be a string of two
- or more characters that shall collate as an entity. A <_c_o_l_l_a_t_i_n_g- 1
- _e_l_e_m_e_n_t> defined via this keyword is only recognized with the LC_COLLATE 1
- category.
-
- _E_x_a_m_p_l_e:
-
- collating-element <ch> from <c><h>
- collating-element <e-acute> from <acute><e>
- collating-element <ll> from ll
-
-
-
-
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 86 2 Terminology and General Requirements
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- Table 2-7 - LC_COLLATE Category Definition in the POSIX Locale (_c_o_n_c_l_u_d_e_d)
- _________________________________________________________________________
- <left-square-bracket>
- <backslash>
- <right-square-bracket>
- <circumflex>
- <underline>
- <grave-accent>
- <a>
- <b>
- <c>
- <d>
- <e>
- <f>
- <g>
- <h>
- <i>
- <j>
- <k>
- <l>
- <m>
- <n>
- <o>
- <p>
- <q>
- <r>
- <s>
- <t>
- <u>
- <v>
- <w>
- <x>
- <y>
- <z>
- <left-curly-bracket>
- <vertical-line>
- <right-curly-bracket>
- <tilde>
- <DEL>
- order_end
- #
- END LC_COLLATE
- __________________________________________________________________________________________________________________________________________________
-
- _2._5._2._2._2 collating-symbol _K_e_y_w_o_r_d
-
- This keyword shall be used to define symbols for use in collation
- sequence statements; i.e., between the order_start and the order_end
- keywords. The syntax is
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 2.5 Locale 87
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- "collating-symbol %s\n", <_c_o_l_l_a_t_i_n_g-_s_y_m_b_o_l>
-
- The <_c_o_l_l_a_t_i_n_g-_s_y_m_b_o_l> shall be a symbolic name, enclosed between angle 1
- brackets (< and >), and shall not duplicate any symbolic name in the
- current charmap file (if any), or any other symbolic name defined in this
- collation definition. A <_c_o_l_l_a_t_i_n_g-_s_y_m_b_o_l> defined via this keyword is
- only recognized with the LC_COLLATE category.
-
- _E_x_a_m_p_l_e:
-
- collating-symbol <UPPER_CASE>
- collating-symbol <HIGH>
-
- 2
-
- _2._5._2._2._3 order_start _K_e_y_w_o_r_d
-
- The order_start keyword shall precede collation order entries and also
- defines the number of weights for this collation sequence definition and
- other collation rules.
-
- The syntax of the order_start keyword is:
-
- "order_start %s;%s;...;%s\n", <_s_o_r_t-_r_u_l_e_s>, <_s_o_r_t-_r_u_l_e_s> ...
-
- The operands to the order_start keyword are optional. If present, the
- operands define rules to be applied when strings are compared. The
- number of operands define how many weights each element is assigned; if
- no operands are present, one forward operand is assumed. If present, the
- first operand defines rules to be applied when comparing strings using
- the first (primary) weight; the second when comparing strings using the
- second weight, and so on. Operands shall be separated by semicolons (;).
- Each operand shall consist of one or more collation directives, separated
- by commas (,). If the number or operands exceeds the {COLL_WEIGHTS_MAX}
- limit, the utility shall issue a warning message. The following
- directives shall be supported:
-
- forward Specifies that comparison operations for the weight
- level shall proceed from start of string towards
- the end of string.
-
- backward Specifies that comparison operations for the weight
- level shall proceed from end of string towards the
- beginning of string.
-
- 2
-
- position Specifies that comparison operations for the weight
- level will consider the relative position of non- 2
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 88 2 Terminology and General Requirements
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- IGNOREd elements in the strings. The string 2
- containing a non-IGNOREd element after the fewest 2
- IGNOREd collating elements from the start of the 2
- compare shall collate first. If both strings 2
- contain a non-IGNOREd character in the same 2
- relative position, the collating values assigned to 2
- the elements shall determine the ordering. In case 2
- of equality, subsequent non-IGNOREd characters 2
- shall be considered in the same manner. 2
-
- The directives forward and backward are mutually exclusive.
-
- _E_x_a_m_p_l_e:
-
- order_start forward;backward 2
-
- If no operands are specified, a single forward operand shall be assumed. 1
-
- 2.5.2.2.4 Collation Order
-
- The order_start keyword shall be followed by collating element entries.
- The syntax for the collating element entries is
-
- "%s %s;%s;...;%s\n", <_c_o_l_l_a_t_i_n_g-_e_l_e_m_e_n_t>, <_w_e_i_g_h_t>, <_w_e_i_g_h_t>, ...
-
- Each _c_o_l_l_a_t_i_n_g-_e_l_e_m_e_n_t shall consist of either a character (in any of the 1
- forms defined in 2.5.2), a <_c_o_l_l_a_t_i_n_g-_e_l_e_m_e_n_t>, a <_c_o_l_l_a_t_i_n_g-_s_y_m_b_o_l>, an 1
- ellipsis, or the special symbol UNDEFINED. The order in which collating 1
- elements are specified determines the character collation sequence, such 1
- that each collating element shall compare less than the elements 1
- following it. The NUL character shall compare lower than any other 1
- character. 1
-
- A <_c_o_l_l_a_t_i_n_g-_e_l_e_m_e_n_t> shall be used to specify multicharacter collating 1
- elements, and indicates that the character sequence specified via the 1
- <_c_o_l_l_a_t_i_n_g-_e_l_e_m_e_n_t> is to be collated as a unit and in the relative order 1
- specified by its place. 1
-
- A <_c_o_l_l_a_t_i_n_g-_s_y_m_b_o_l> shall be used to define a position in the relative 1
- order for use in weights. 1
-
- The ellipsis symbol (``...'') specifies that a sequence of characters 1
- shall collate according to their encoded character values. It shall be 1
- interpreted as indicating that all characters with a coded character set
- value higher than the value of the character in the preceding line, and
- lower than the coded character set value for the character in the
- following line, in the current coded character set, shall be placed in
- the character collation order between the previous and the following
- character in ascending order according to their coded character set
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 2.5 Locale 89
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- values. An initial ellipsis shall be interpreted as if the preceding
- line specified the NUL character, and a trailing ellipsis as if the
- following line specified the highest coded character set value in the
- current coded character set. An ellipsis shall be treated as invalid if
- the preceding or following lines do not specify characters in the current
- coded character set. The use of the ellipsis symbol ties the definition 1
- to a specific coded character set and may preclude the definition from 1
- being portable between implementations. 1
-
- The symbol UNDEFINED shall be interpreted as including all coded
- character set values not specified explicitly or via the ellipsis symbol.
- Such characters shall be inserted in the character collation order at the
- point indicated by the symbol, and in ascending order according to their 1
- coded character set values. If no UNDEFINED symbol is specified, and the 1
- current coded character set contains characters not specified in this
- clause, the utility shall issue a warning message and place such
- characters at the end of the character collation order.
-
- The optional operands for each collation-element shall be used to define
- the primary, secondary, or subsequent weights for the collating element.
- The first operand specifies the relative primary weight, the second the
- relative secondary weight, and so on. Two or more collation-elements can
- be assigned the same weight; they belong to the same _e_q_u_i_v_a_l_e_n_c_e _c_l_a_s_s if 1
- they have the same primary weight. Collation shall behave as if, for 1
- each weight level, IGNOREd elements are removed. Then each successive 2
- pair of elements shall be compared according to the relative weights for 1
- the elements. If the two strings compare equal, the process shall be 1
- repeated for the next weight level, up to the limit {COLL_WEIGHTS_MAX}. 1
-
- Weights shall be expressed as characters (in any of the forms specified 1
- in 2.5.2), <_c_o_l_l_a_t_i_n_g-_s_y_m_b_o_l>s, <_c_o_l_l_a_t_i_n_g-_e_l_e_m_e_n_t>s, an ellipsis, or the 1
- special symbol IGNORE. A single character, a <_c_o_l_l_a_t_i_n_g-_s_y_m_b_o_l>, or a 1
- <_c_o_l_l_a_t_i_n_g-_e_l_e_m_e_n_t> shall represent the relative order in the character 1
- collating sequence of the character or symbol, rather than the character 1
- or characters themselves. 1
-
- One-to-many mapping is indicated by specifying two or more concatenated 1
- characters or symbolic names. Thus, if the character ``<eszet>'' is 1
- given the string <s><s> as a weight, comparisons shall be performed as if 1
- all occurrences of the character <eszet> are replaced by <s><s>. If it 1
- is desirable to define <eszet> and <s><s> as an equivalence class, then a 1
- collating-element must be defined for the string ``ss'', as in the 1
- example below. 1
-
- All characters specified via an ellipsis shall by default be assigned 1
- unique weights, equal to the relative order of characters. Characters 1
- specified via an explicit or implicit UNDEFINED special symbol shall by 1
- default be assigned the same primary weight (i.e., belong to the same 1
- equivalence class). An ellipsis symbol as a weight shall be interpreted 1
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 90 2 Terminology and General Requirements
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- to mean that each character in the sequence shall have unique weights, 1
- equal to the relative order of their character in the character collation 1
- sequence. Secondary and subsequent weights have unique values. The use 1
- of the ellipsis as a weight shall be treated as an error if the collating 1
- element is neither an ellipsis nor the special symbol UNDEFINED. 1
-
- The special keyword IGNORE as a weight shall indicate that when strings
- are compared using the weights at the level where IGNORE is specified,
- the collating element shall be ignored; i.e., as if the string did not
- contain the collating element. In regular expressions and pattern
- matching, all characters that are IGNOREd in their primary weight form an
- equivalence class.
-
- An empty operand shall be interpreted as the collating-element itself.
-
- For example, the order statement
-
- <a> <a>;<a>
-
- is equal to
-
- <a>
-
- An ellipsis can be used as an operand if the collating-element was an
- ellipsis, and shall be interpreted as the value of each character defined
- by the ellipsis.
-
- The collation order as defined in this clause defines the interpretation 1
- of bracket expressions in regular expressions (see 2.8.3.2). 1
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 2.5 Locale 91
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- _E_x_a_m_p_l_e:
-
- order_start forward;backward
- UNDEFINED IGNORE;IGNORE
- <LOW>
- <space> <LOW>;<space>
- ... <LOW>;...
- <a> <a>;<a>
- <a-acute> <a>;<a-acute>
- <a-grave> <a>;<a-grave>
- <A> <a>;<A>
- <A-acute> <a>;<A-acute>
- <A-grave> <a>;<A-grave>
- <ch> <ch>;<ch>
- <Ch> <ch>;<Ch>
- <s> <s>;<s>
- 2
- <eszet> <s><s>;<eszet><eszet>
- ... <HIGH>;...
- <HIGH>
- order_end
-
- This example is interpreted as follows:
-
- (1) The UNDEFINED means that all characters not specified in this
- definition (explicitly or via the ellipsis) shall be ignored for
- collation purposes; for regular expression purposes they are
- ordered first.
-
- (2) All characters between <space> and <a> shall have the same
- primary equivalence class and individual secondary weights based
- on their ordinal encoded values.
-
- (3) All characters based on the upper- or lowercase character a
- belong to the same primary equivalence class.
-
- (4) The multicharacter collating element <c><h> is represented by
- the collating symbol <ch> and belongs to the same primary
- equivalence class as the multicharacter collating element
- <C><h>.
-
- (5) Note that it is not possible to use the collating element <ss> 1
- as a weight and expect it to be expanded to the string ``ss''. 1
- When used as a weight, any collating-element represents the 1
- relative order assigned to it in the character collation 1
- sequence, not the string from which it was derived (compare with 1
- <ch>). 1
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 92 2 Terminology and General Requirements
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- 2.5.2.2.5 order_end Keyword
-
- The collating order entries shall be terminated with an order_end
- keyword.
-
- BEGIN_RATIONALE
-
- 2.5.2.2.6 LC_COLLATE Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f
- _P_1_0_0_3._2)
-
- The LC_COLLATE category governs the collation order in the locale, and
- thus the processing of the C Standard {7} _s_t_r_x_f_r_m() and _s_t_r_c_o_l_l()
- functions, as well as a number of POSIX.2 utilities.
-
- The rules governing collation depends to some extent on the use. At
- least five different levels of increasingly complex collation rules can
- be distinguished:
-
- (1) Byte/machine code order. This is the historical collation order
- in the UNIX system and many proprietary operating systems.
- Collation is here done character by character, without any
- regard to context. The primary virtue is that it usually is
- quite fast, and also completely deterministic; it works well
- when the native machine collation sequence matches the user
- expectations.
-
- (2) Character order. On this level, collation is also done
- character by character, without regard to context. The order
- between characters is, however, not determined by the code
- values, but on the user's expectations of the ``correct'' order
- between characters. In addition, such a (simple) collation
- order can specify that certain characters collate equal (e.g.,
- upper- and lowercase letters).
-
- (3) String ordering. On this level, entire strings are compared
- based on relatively straightforward rules. At this level,
- several ``passes'' may be required to determine the order
- between two strings. Characters may be ignored in some passes,
- but not in others; the strings may be compared in different
- directions; and simple string substitutions may be made before
- strings are compared. This level is best described as
- ``dictionary'' ordering; it is based on the spelling, not the
- pronunciation, or meaning, of the words.
-
- (4) Text search ordering. This is a further refinement of the
- previous level, best described as ``telephone book ordering''; 1
- some common homonyms (words spelled differently but with same 1
- pronunciation) are collated together; numbers are collated as if
- spelled with words, and so on.
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 2.5 Locale 93
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- (5) Semantic level ordering. Words and strings are collated based
- on their meaning; entire words (such as ``the'') are eliminated,
- the ordering is not deterministic. This usually requires
- special software, and is highly dependent on the intended use.
-
- While the historical collation order formally is at level 1, for the
- English language it corresponds roughly to elements at level 2. The user
- expects to see the output from the ls utility sorted very much as as it
- would be in a dictionary. While telephone book ordering would be an
- optimal goal for standard collation, this was ruled out as the order
- would be language dependent. Furthermore, a requirement was that the
- order must be determined solely from the text string and the collation
- rules; no external information (e.g., ``pronunciation dictionaries'')
- could be required.
-
- As a result, the goal for the collation support is at level 3. This also
- matches the requirements for the proposed Canadian collation order, as
- well as other, known collation requirements for alphabetic scripts. It
- specifically rules out collation based on pronunciation rules, or based
- on semantic analysis of the text.
-
- The syntax for the LC_COLLATE category source is the result of a
- cooperative effort between representatives for many countries and
- organizations working with international issues, such as UniForum,
- X/Open, and ISO, and it meets the requirements for level 3, and has been
- verified to produce the correct result with examples based on French,
- Canadian, and Danish collation order, as well as meeting the requirements
- in the X/Open Portability Guide, Issue 3. {B31}. Because it supports
- multicharacter collating elements, it is also capable of supporting
- collation in code sets where a character is expressed using nonspacing
- characters followed by the base character (such as ISO 6937 {B6}).
-
- The directives that can be specified in an operand to the order_start 2
- keyword are based on the requirements specified in several proposed 2
- standards and in customary use. The following is a rephrasing of rules 2
- defined for ``lexical ordering in English and French'' by the Canadian 2
- Standards Association (text is brackets is rephrased): 2
-
- (1) Once special characters ([punctuation]) have been removed from 2
- original strings, the ordering is determined by scanning forward 2
- (left to right) [disregarding case and diacriticals]. 2
-
- (2) In case of equivalence, special characters are once again 2
- removed from original strings and the ordering is determined 2
- scanning backward (starting from the rightmost character of the 2
- string and back), character by character, [disregarding case but 2
- considering diacriticals]. 2
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 94 2 Terminology and General Requirements
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- (3) In case of repeated equivalence, special characters are removed 2
- again from original strings and the ordering is determined 2
- scanning forward, character by character, [considering both case 2
- and diacriticals]. 2
-
- (4) If there is still an ordering equivalence after rules (1) 2
- through (3) have been applied, then only special characters and 2
- the position they occupy in the string are considered to 2
- determine ordering. The string that has a special character in 2
- the lowest position comes first. If two strings have a special 2
- character in the same position, the character [with the lowest 2
- collation value] comes first. In case of equality, the other 2
- special characters are considered until there is a difference or 2
- all special characters have been exhausted. 2
-
- It is estimated that the standard covers the requirements for all
- European languages, and no particular problems are anticipated with
- Slavic or Middle East character sets.
-
- The Far East (particularly Japanese/Chinese) collations are often based
- on contextual information and pronunciation rules (the same ideogram can
- have different meanings and different pronunciations). Such collation,
- in general, falls outside the desired goal of the standard. There are,
- however, several other collation rules (stroke/radical, or ``most common
- pronunciation'') which can be supported with the mechanism described
- here.
-
- Previous drafts contained a substitute statement, which performed a 2
- regular expression style replacement before string compares. It has been 2
- withdrawn based on balloter objections that it was not required for the 2
- types of ordering POSIX.2 is aimed at. 2
-
- The character (and collating element) order is defined by the order in 2
- which characters and elements are specified between the order_start and 2
- order_end keywords. This character order is used in range expressions in 2
- regular expressions (see 2.8). Weights assigned to the characters and 2
- elements defines the collation sequence; in the absence of weights, the 2
- character order is also the collation sequence. 2
-
- The position keyword was introduced to provide the capability to 1
- consider, in a compare, the relative position of non-IGNORE_d characters. 1
- As an example, consider the two strings ``o-ring'' and ``or-ing''. 1
- Assuming the hyphen is IGNORE_d on the first pass, the two strings will 1
- compare equal, and the position of the hyphen is immaterial. On second 1
- pass, all characters except the hyphen are IGNORE_d, and in the normal 1
- case the two strings would again compare equal. By taking position into 1
- account, the first collates before the second. 1
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 2.5 Locale 95
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- END_RATIONALE 1
-
-
- 2.5.2.3 LC_MONETARY
-
- Table 2-8 - LC_MONETARY Category Definition in the POSIX Locale
- __________________________________________________________________________________________________________________________________________________
- LC_MONETARY
- # This is the POSIX Locale definition for
- # the LC_MONETARY category.
- #
- int_curr_symbol ""
- currency_symbol ""
- mon_decimal_point ""
- mon_thousands_sep ""
- mon_grouping ""
- positive_sign ""
- negative_sign ""
- int_frac_digits -1
- p_cs_precedes -1
- p_sep_by_space -1
- n_cs_precedes -1
- n_sep_by_space -1
- p_sign_posn -1
- n_sign_posn -1
- #
- END LC_MONETARY
- __________________________________________________________________________________________________________________________________________________
-
- The LC_MONETARY category shall define the rules and symbols that shall be
- used to format monetary numeric information. The operands are strings.
- For some keywords, the strings can contain only integers. Keywords that
- are not provided, string values set to the empty string (""), or integer 1
- keywords set to -1, shall be used to indicate that the value is 1
- unspecified. The following keywords shall be recognized:
-
- copy Specify the name of an existing locale to be
- used as the source for the definition of this
- category. If this keyword is specified, no
- other keyword shall be specified.
-
- int_curr_symbol The international currency symbol. The operand
- shall be a four-character string, with the first
- three characters containing the alphabetic
- international currency symbol in accordance with
- those specified in ISO 4217 {3} (_C_o_d_e_s _f_o_r _t_h_e
- _r_e_p_r_e_s_e_n_t_a_t_i_o_n _o_f _c_u_r_r_e_n_c_i_e_s _a_n_d _f_u_n_d_s). The
- fourth character shall be the character used to
- separate the international currency symbol from
- the monetary quantity.
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 96 2 Terminology and General Requirements
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- currency_symbol The string that shall be used as the local
- currency symbol.
-
- mon_decimal_point The operand is a string containing the symbol 2
- that shall be used as the decimal delimiter in 2
- monetary formatted quantities. In contexts 2
- where other standards limit the 2
- mon_decimal_point to a single byte, the result 2
- of specifying a multibyte operand is 2
- unspecified. 2
-
- mon_thousands_sep The operand is a string containing the symbol 2
- that shall be used as a separator for groups of 2
- digits to the left of the decimal delimiter in 2
- formatted monetary quantities. In contexts 2
- where other standards limit the 2
- mon_thousands_sep to a single byte, the result 2
- of specifying a multibyte operand is 2
- unspecified. 2
-
- mon_grouping Define the size of each group of digits in
- formatted monetary quantities. The operand is a
- sequence of integers separated by semicolons.
- Each integer specifies the number of digits in
- each group, with the initial integer defining
- the size of the group immediately preceding the
- decimal delimiter, and the following integers
- defining the preceding groups. If the last 2
- integer is not -1, then the size of the previous 2
- group (if any) shall be repeatedly used for the 2
- remainder of the digits. If the last integer is 2
- -1, then no further grouping shall be performed. 2
-
- positive_sign A string that shall be used to indicate a
- nonnegative-valued formatted monetary quantity.
-
- negative_sign A string that shall be used to indicate a
- negative-valued formatted monetary quantity.
-
- int_frac_digits An integer representing the number of fractional
- digits (those to the right of the decimal
- delimiter) to be written in a formatted monetary
- quantity using int_curr_symbol.
-
- frac_digits An integer representing the number of fractional
- digits (those to the right of the decimal
- delimiter) to be written in a formatted monetary
- quantity using currency_symbol.
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 2.5 Locale 97
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- p_cs_precedes An integer set to 1 if the currency_symbol or
- int_curr_symbol precedes the value for a
- nonnegative formatted monetary quantity, and set
- to 0 if the symbol succeeds the value.
-
- p_sep_by_space An integer set to 0 if no space separates the
- currency_symbol or int_curr_symbol from the
- value for a nonnegative formatted monetary
- quantity, set to 1 if a space separates the
- symbol from the value, and set to 2 if a space
- separates the symbol and the sign string, if
- adjacent.
-
- n_cs_precedes An integer set to 1 if the currency_symbol or
- int_curr_symbol precedes the value for a
- negative formatted monetary quantity, and set to
- 0 if the symbol succeeds the value.
-
- n_sep_by_space An integer set to 0 if no space separates the
- currency_symbol or int_curr_symbol from the
- value for a negative formatted monetary
- quantity, set to 1 if a space separates the
- symbol from the value, and set to 2 if a space
- separates the symbol and the sign string, if
- adjacent.
-
- p_sign_posn An integer set to a value indicating the
- positioning of the positive_sign for a
- nonnegative formatted monetary quantity. The
- following integer values shall be recognized:
-
- 0 Parentheses enclose the quantity and the
- currency_symbol or int_curr_symbol.
-
- 1 The sign string precedes the quantity and
- the currency_symbol or int_curr_symbol.
-
- 2 The sign string succeeds the quantity and
- the currency_symbol or int_curr_symbol.
-
- 3 The sign string immediately precedes the
- currency_symbol or int_curr_symbol.
-
- 4 The sign string immediately succeeds the
- currency_symbol or int_curr_symbol.
-
- n_sign_posn An integer set to a value indicating the
- positioning of the negative_sign for a negative 1
- formatted monetary quantity. The following
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 98 2 Terminology and General Requirements
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- integer values shall be recognized:
-
- 0 Parentheses enclose the quantity and the
- currency_symbol or int_curr_symbol.
-
- 1 The sign string precedes the quantity and
- the currency_symbol or int_curr_symbol.
-
- 2 The sign string succeeds the quantity and
- the currency_symbol or int_curr_symbol.
-
- 3 The sign string immediately precedes the
- currency_symbol or int_curr_symbol.
-
- 4 The sign string immediately succeeds the
- currency_symbol or int_curr_symbol.
-
- BEGIN_RATIONALE
-
- 2.5.2.3.1 LC_MONETARY Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f
- _P_1_0_0_3._2)
-
- The currency symbol does not appear in LC_MONETARY because it is not
- defined in the C Standard's {7} C locale.
-
- The C Standard {7} limits the size of decimal points and thousands 2
- delimiters to single-byte values. In locales based on multibyte coded 2
- character sets this cannot be enforced, obviously; this standard does not 2
- prohibit such characters, but makes the behavior unspecified [in the text 2
- ``In contexts where other standards ...'']. 2
-
- The grouping specification is based on, but not identical to, the 2
- C Standard {7}. The ``-1'' signals that no further grouping shall be 2
- performed, the equivalent of {CHAR_MAX} in the C Standard {7}). 2
-
- The locale definition is an extension of the C Standard {7} _l_o_c_a_l_e_c_o_n_v()
- specification. In particular, rules on how currency_symbol is treated
- are extended to also cover int_curr_symbol, and p_set_by_space and
- n_sep_by_space have been augmented with the value 2, which places a space
- between the sign and the symbol (if they are adjacent; otherwise it
- should be treated as a 0). The following table shows the result of
- various combinations:
-
-
-
-
-
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 2.5 Locale 99
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- p_sep_by_space
- 2 1 0
-
- p_cs_precedes = 1 p_sign_posn = 0 ($1.25) ($ 1.25) ($1.25)
- p_sign_posn = 1 + $1.25 +$ 1.25 +$1.25
- p_sign_posn = 2 $1.25 + $ 1.25+ $1.25+
- p_sign_posn = 3 + $1.25 +$ 1.25 +$1.25
- p_sign_posn = 4 $ +1.25 $+ 1.25 $+1.25
-
- p_cs_precedes = 0 p_sign_posn = 0 (1.25 $) (1.25 $) (1.25$)
- p_sign_posn = 1 +1.25 $ +1.25 $ +1.25$
- p_sign_posn = 2 1.25$ + 1.25 $+ 1.25$+
- p_sign_posn = 3 1.25+ $ 1.25 +$ 1.25+$
- p_sign_posn = 4 1.25$ + 1.25 $+ 1.25$+
-
- The following is an example of the interpretation of the mon_grouping
- keyword. Assuming that the value to be formatted is 123456789 and the
- mon_thousands_sep is ', then the following table shows the result. The 1
- third column shows the equivalent C Standard {7} string that would be 1
- used to accommodate this grouping. It is the responsibility of the 1
- utility to perform mappings of the formats in this clause to those used 1
- by language bindings such as the C Standard {7}. 1
-
- mon_grouping Formatted Value C Standard {7} String 1
- ____________ _______________ _____________________ 1
- 3;-1 123456'789 "\3\177" 2
- 3 123'456'789 "\3" 2
- 3;2;-1 1234'56'789 "\3\2\177" 2
- 3;2 12'34'56'789 "\3\2" 2
- -1 123456789 "177" 2
-
- In these examples, the octal value of {CHAR_MAX} is 177. 2
-
- END_RATIONALE
-
- 2.5.2.4 LC_NUMERIC
-
- The LC_NUMERIC category shall define the rules and symbols that shall be
- used to format nonmonetary numeric information. The operands are
- strings. For some keywords, the strings only can contain integers.
- Keywords that are not provided, string values set to the empty string 1
- (""), or integer keywords set to -1, shall be used to indicate that the 1
- value is unspecified. The following keywords shall be recognized:
-
- copy Specify the name of an existing locale to be used
- as the source for the definition of this category.
- If this keyword is specified, no other keyword
- shall be specified.
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 100 2 Terminology and General Requirements
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- decimal_point The operand is a string containing the symbol that 2
- shall be used as the decimal delimiter in numeric, 2
- nonmonetary formatted quantities. This keyword 2
- cannot be omitted and cannot be set to the empty 2
- string. In contexts where other standards limit 2
- the decimal_point to a single byte, the result of 2
- specifying a multibyte operand is unspecified. 2
-
- thousands_sep The operand is a string containing the symbol that 2
- shall be used as a separator for groups of digits 2
- to the left of the decimal delimiter in numeric, 2
- nonmonetary formatted monetary quantities. In 2
- contexts where other standards limit the 2
- thousands_sep to a single byte, the result of 2
- specifying a multibyte operand is unspecified. 2
-
- grouping Define the size of each group of digits in
- formatted nonmonetary quantities. The operand is a
- sequence of integers separated by semicolons. Each
- integer specifies the number of digits in each
- group, with the initial integer defining the size
- of the group immediately preceding the decimal
- delimiter, and the following integers defining the
- preceding groups. If the last integer is not -1, 2
- then the size of the previous group (if any) shall 2
- be repeatedly used for the remainder of the digits. 2
- If the last integer is -1, then no further grouping 2
- shall be performed. 2
-
- Table 2-9 - LC_NUMERIC Category Definition in the POSIX Locale
- __________________________________________________________________________________________________________________________________________________
- LC_NUMERIC
- # This is the POSIX Locale definition for
- # the LC_NUMERIC category.
- #
- decimal_point "<period>" 2
- thousands_sep ""
- grouping 0
- #
- END LC_NUMERIC
- __________________________________________________________________________________________________________________________________________________
-
- BEGIN_RATIONALE
-
-
-
-
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 2.5 Locale 101
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- 2.5.2.4.1 LC_NUMERIC Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f
- _P_1_0_0_3._2)
-
- See the rationale for LC_MONETARY (2.5.2.3.1) for a description of the 1
- behavior of grouping. 1
-
- END_RATIONALE 1
-
-
- 2.5.2.5 LC_TIME
-
- The LC_TIME category shall define the interpretation of the field
- descriptors supported by the date utility (see 4.15).
-
- Table 2-10 - LC_TIME Category Definition in the POSIX Locale
- __________________________________________________________________________________________________________________________________________________
- LC_TIME
- # This is the POSIX Locale definition for
- # the LC_TIME category.
- #
- # Abbreviated weekday names (%a)
- abday "<S><u><n>";"<M><o><n>";"<T><u><e>";"<W><e><d>";\
- "<T><h><u>";"<F><r><i>";"<S><a><t>"
- #
- # Full weekday names (%A)
- day "<S><u><n><d><a><y>";"<M><o><n><d><a><y>";\
- "<T><u><e><s><d><a><y>";"<W><e><d><n><e><s><d><a><y>";\
- "<T><h><u><r><s><d><a><y>";"<F><r><i><d><a><y>";\
- "<S><a><t><u><r><d><a><y>"
- #
- # Abbreviated month names (%b)
- abmon "<J><a><n>";"<F><e><b>";"<M><a><r>";\
- "<A><p><r>";"<M><a><y>";"<J><u><n>";\
- "<J><u><l>";"<A><u><g>";"<S><e><p>";\
- "<O><c><t>";"<N><o><v>";"<D><e><c>"
- #
- # Full month names (%B)
- mon "<J><a><n><u><a><r><y>";"<F><e><b><r><u><a><r><y>";\
- "<M><a><r><c><h>";"<A><p><r><i><l>";\
- "<M><a><y>";"<J><u><n><e>";\
- "<J><u><l><y>";"<A><u><g><u><s><t>";\
- "<S><e><p><t><e><m><b><e><r>";"<O><c><t><o><b><e><r>";\
- "<N><o><v><e><m><b><e><r>";"<D><e><c><e><m><b><e><r>"
- #
- # Equivalent of AM/PM (%p) "AM";"PM"
- am_pm "<A><M>";"<P><M>"
- #
- # Appropriate date and time representation (%c)
- # "%a %b %e %H:%M:%S %Y" 1
- d_t_fmt "<percent-sign><a><space><percent-sign><b><space><percent-sign><e>\1
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 102 2 Terminology and General Requirements
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- <space><percent-sign><H><colon><percent-sign><M>\
- <colon><percent-sign><S><space><percent-sign><Y>"
- #
- # Appropriate date representation (%x) "%m/%d/%y"
- d_fmt "<percent-sign><m><slash><percent-sign><d><slash><percent-sign><y>"
- #
- # Appropriate time representation (%X) "%H:%M:%S"
- t_fmt "<percent-sign><H><colon><percent-sign><M><colon><percent-sign><S>"
- #
- # Appropriate 12-hour time representation (%r) "%I:%M:%S %p"
- t_fmt_ampm "<percent-sign><I><colon><percent-sign><M><colon>\
- <percent-sign><S> <percent_sign><p>"
- #
- END LC_TIME
-
- __________________________________________________________________________________________________________________________________________________
-
-
-
-
- The following mandatory keywords shall be recognized:
-
- copy Specify the name of an existing locale to be used as the
- source for the definition of this category. If this
- keyword is specified, no other keyword shall be specified.
-
- abday Define the abbreviated weekday names, corresponding to the
- %a field descriptor. The operand shall consist of seven
- semicolon-separated strings. The first string shall be the
- abbreviated name of the first day of the week (Sunday), the
- second the abbreviated name of the second day, and so on.
-
- day Define the full weekday names, corresponding to the %A
- field descriptor. The operand shall consist of seven
- semicolon-separated strings. The first string shall be the
- full name of the first day of the week (Sunday), the second
- the full name of the second day, and so on.
-
- abmon Define the abbreviated month names, corresponding to the %b
- field descriptor. The operand shall consist of twelve
- semicolon-separated strings. The first string shall be the
- abbreviated name of the first month of the year (January),
- the second the abbreviated name of the second month, and so
- on.
-
- mon Define the full month names, corresponding to the %B field
- descriptor. The operand shall consist of twelve
- semicolon-separated strings. The first string shall be the
- full name of the first month of the year (January), the
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 2.5 Locale 103
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- second the full name of the second month, and so on.
-
- d_t_fmt Define the appropriate date and time representation,
- corresponding to the %c field descriptor. The operand
- shall consist of a string, and can contain any combination
- of characters and field descriptors. In addition, the
- string can contain escape sequences defined in Table 2-15. 1
-
- d_fmt Define the appropriate date representation, corresponding
- to the %x field descriptor. The operand shall consist of a
- string, and can contain any combination of characters and
- field descriptors. In addition, the string can contain
- escape sequences defined in Table 2-15. 1
-
- t_fmt Define the appropriate time representation, corresponding
- to the %X field descriptor. The operand shall consist of a
- string, and can contain any combination of characters and
- field descriptors. In addition, the string can contain
- escape sequences defined in Table 2-15. 1
-
- am_pm Define the appropriate representation of the _a_n_t_e _m_e_r_i_d_i_e_m
- and _p_o_s_t _m_e_r_i_d_i_e_m strings, corresponding to the %p field
- descriptor. The operand shall consist of two strings,
- separated by a semicolon. The first string shall represent
- the _a_n_t_e _m_e_r_i_d_i_e_m designation, the last string the _p_o_s_t
- _m_e_r_i_d_i_e_m designation.
-
- t_fmt_ampm
- Define the appropriate time representation in the 12-hour
- clock format with am_pm, corresponding to the %r field
- descriptor. The operand shall consist of a string and can
- contain any combination of characters and field
- descriptors. If the string is empty, the 12-hour format is
- not supported in the locale.
-
- It is implementation defined whether the following optional keywords
- shall be recognized. If they are not supported, but present in a
- localedef source, they shall be ignored.
-
- era Shall be used to define alternate Eras, corresponding
- to the %E field descriptor modifier. The format of the
- operand is unspecified, but shall support the
- definition of the %EC and %Ey field descriptors, and
- may also define the era_year format (%EY).
-
- era_year Shall be used to define the format of the year in
- alternate Era format, corresponding to the %EY field
- descriptor.
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 104 2 Terminology and General Requirements
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- era_d_fmt Shall be used to define the format of the date in
- alternate Era notation, corresponding to the %Ex field
- descriptor.
-
- alt_digits Shall be used to define alternate symbols for digits,
- corresponding to the %O field descriptor modifier. The
- operand shall consist of semicolon-separated strings.
- The first string shall be the alternate symbol
- corresponding with zero, the second string the symbol
- corresponding with one, and so on. Up to 100 alternate
- symbol strings can be specified. The %O modifier
- indicates that the string corresponding to the value
- specified via the field descriptor shall be used
- instead of the value.
-
- BEGIN_RATIONALE
-
- 2.5.2.5.1 LC_TIME Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2)
-
- Although certain of the field descriptors in the POSIX Locale (such as
- the name of the month) are shown with initial capital letters, this need
- not be the case in other locales. Programs using these fields may need
- to adjust the capitalization if the output is going to be used at the
- beginning of a sentence.
-
- The LC_TIME descriptions of abday, daya, and abmon imply a Gregorian 1
- style calendar (7-day weeks, 12-month years, leap years, etc.). 1
- Formatting time strings for other types of calendars is outside the scope 1
- of this standard. 1
-
- As specified under the date command, the field descriptors corresponding
- to the optional keywords consist of a modifier followed by a traditional
- field descriptor (for instance %Ex). If the optional keywords are not
- supported by the implementation or are unspecified for the current
- locale, these field descriptors shall be treated as the traditional field
- descriptor. For instance, assume the following keywords:
-
- alt_digits "0th";"1st";"2nd";"3rd";"4th";"5th";\ 1
- "6th";"7th";"8th";"9th";"10th" 1
-
- d_fmt "The %Od day of %B in %Y" 1
-
- On 7/4/1776, the %x field descriptor would result in ``The 4th day of 1
- July in 1776,'' while 7/14/1789 would come out as ``The 14 day of July in
- 1789.'' It can be noted that the above example is for illustrative
- purposes only; the %O modifier is primarily intended to provide for Kanji
- or Hindi digits in date formats.
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 2.5 Locale 105
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- While it is clear that an alternate year format is required, there is no
- consensus on the format or the requirements. As a result, while these
- keywords are reserved, the details are left unspecified. It is expected
- that National Standards Bodies will provide specifications.
-
- END_RATIONALE
-
-
- 2.5.2.6 LC_MESSAGES
-
- The LC_MESSAGES category shall define the format and values for
- affirmative and negative responses. The operands shall be strings or
- extended regular expressions; see 2.8.4. The following keywords shall be
- recognized:
-
- copy Specify the name of an existing locale to be used as the
- source for the definition of this category. If this
- keyword is specified, no other keyword shall be specified.
-
- yesexpr The operand shall consist of an extended regular expression
- that describes the acceptable affirmative response to a
- question expecting an affirmative or negative response.
-
- noexpr The operand shall consist of an extended regular expression
- that describes the acceptable negative response to a
- question expecting an affirmative or negative response.
-
- Table 2-11 - LC_MESSAGES Category Definition in the POSIX Locale
- __________________________________________________________________________________________________________________________________________________
- LC_MESSAGES
- # This is the POSIX Locale definition for
- # the LC_MESSAGES category.
- #
- yesexpr "<circumflex><left-square-bracket><y><Y><right-square-bracket>"
- #
- noexpr "<circumflex><left-square-bracket><n><N><right-square-bracket>"
- END LC_MESSAGES
- __________________________________________________________________________________________________________________________________________________
-
-
- BEGIN_RATIONALE
-
- 2.5.2.6.1 LC_MESSAGES Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f
- _P_1_0_0_3._2)
-
- The LC_MESSAGES category is described in 2.6 as affecting the language
- used by utilities for their output. The mechanism used by the
- implementation to accomplish this, other than the responses shown here in
- the locale definition file, is not specified by this version of this
- standard. The POSIX.1 working group is developing an interface that
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 106 2 Terminology and General Requirements
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- would allow applications (and, presumably some of the standard utilities)
- to access messages from various message catalogs, tailored to a user's
- LC_MESSAGES value.
-
- END_RATIONALE
-
-
- 2.5.3 Locale Definition Grammar 1
-
- The grammar and lexical conventions in this subclause shall together 1
- describe the syntax for the locale definition source. The general 1
- conventions for this style of grammar are described in 2.1.2. Any 1
- discrepancies found between this grammar and other descriptions in this 1
- clause shall be resolved in favor of this grammar. 1
-
-
- 2.5.3.1 Locale Lexical Conventions 1
-
- The lexical conventions for the locale definition grammar are described 1
- in this subclause. 1
-
- The following tokens shall be processed (in addition to those string 1
- constants shown in the grammar): 1
-
- LOC_NAME A string of characters representing the name of a 1
- locale. 1
-
- CHAR Any single character. 1
-
- NUMBER A decimal number, represented by one or more decimal 2
- digits. 2
-
- COLLSYMBOL A symbolic name, enclosed between angle brackets. The 1
- string shall not duplicate any charmap symbol defined 1
- in the current charmap (if any), or a COLLELEMENT 1
- symbol. 1
-
- COLLELEMENT A symbolic name, enclosed between angle brackets, which 1
- shall not duplicate either any charmap symbol or a 1
- CHARSYMBOL symbol. 1
-
- CHARSYMBOL A symbolic name, enclosed between angle brackets, from 1
- the current charmap (if any). 1
-
- OCTAL_CHAR One or more octal representations of the encoding of 1
- each byte in a single character. The octal 1
- representation consists of an escape_char (normally a 1
- backslash) followed by two or more octal digits. 1
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 2.5 Locale 107
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- HEX_CHAR One or more hexadecimal representations of the encoding 1
- of each byte in a single character. The hexadecimal 1
- representation consists of an escape_char followed by 1
- the constant 'x' and two or more hexadecimal digits. 1
-
- DECIMAL_CHAR One or more decimal representations of the encoding of 1
- each byte in a single character. The decimal 1
- representation consists of an escape_char and followed 1
- by a 'd' and two or more decimal digits. 1
-
- ELLIPSIS The string ``...''. 1
-
- 2
-
- EXTENDED_REG_EXP 1
- An extended regular expression as defined in the 1
- grammar in 2.8.5.2. 1
-
- 2
-
- EOL The line termination character <newline>. 1
-
-
- 2.5.3.2 Locale Grammar 1
-
- This subclause presents the grammar for the locale definition. 1
-
- %token LOC_NAME 1
- %token CHAR 1
- %token NUMBER 2
- %token COLLSYMBOL COLLELEMENT 1
- %token CHARSYMBOL OCTAL_CHAR HEX_CHAR DECIMAL_CHAR 1
- %token ELLIPSIS 1
- %token EXTENDED_REG_EXP 2
- %token EOL 1
-
- %start locale_definition 1
-
- %% 1
-
- locale_definition : global_statements locale_categories 2
- | locale_categories 2
- ; 1
-
- global_statements : global_statements symbol_redefine 2
- | symbol_redefine 2
- ; 1
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 108 2 Terminology and General Requirements
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- symbol_redefine : '#escape_char' CHAR EOL 1
- | '#comment_char' CHAR EOL 1
- ; 1
-
- locale_categories : locale_categories locale_category 2
- | locale_category 2
- ; 1
-
- locale_category : lc_ctype | lc_collate | lc_messages 1
- | lc_monetary | lc_numeric | lc_time 1
- ; 1
-
- /* The following grammar rules are common to all categories */ 1
-
- char_list : char_list char_symbol 2
- | char_symbol 2
- ; 1
-
- char_symbol : CHAR | CHARSYMBOL 1
- | OCTAL_CHAR | HEX_CHAR | DECIMAL_CHAR 1
- ; 1
-
- locale_name : LOC_NAME 1
- | '"' LOC_NAME '"' 1
- ; 1
-
- /* The following is the LC_CTYPE category grammar */ 1
-
- lc_ctype : ctype_hdr ctype_keywords ctype_tlr 2
- | ctype_hdr 'copy' locale_name EOL ctype_tlr 2
- ; 2
-
- ctype_hdr : 'LC_CTYPE' EOL 2
- ; 2
-
- ctype_keywords : ctype_keywords ctype_keyword 2
- | ctype_keyword 2
- ; 1
-
- ctype_keyword : charclass_keyword charclass_list EOL 1
- | charconv_keyword charconv_list EOL 1
- ; 1
-
- charclass_keyword : 'upper' | 'lower' | 'alpha' | 'digit' 1
- | 'alnum' | 'xdigit' | 'space' | 'print' 1
- | 'graph' | 'blank' | 'cntrl' 1
- ; 1
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 2.5 Locale 109
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- charclass_list : charclass_list ';' char_symbol 2
- | charclass_list ';' ELLIPSIS ';' char_symbol 1
- | char_symbol 2
- ; 1
-
- charconv_keyword : 'toupper' 1
- | 'tolower' 1
- ; 1
-
- charconv_list : charconv_list ';' charconv_entry 2
- | charconv_entry 2
- ; 1
-
- charconv_entry : '(' char_symbol ',' char_symbol ')' 1
- ; 1
-
- ctype_tlr : 'END' 'LC_CTYPE' EOL 2
- ; 1
-
- /* The following is the LC_COLLATE category grammar */ 1
-
- lc_collate : collate_hdr collate_keywords collate_tlr 2
- | collate_hdr 'copy' locale_name EOL collate_tlr 2
- ; 2
-
- collate_hdr : 'LC_COLLATE' EOL 2
- ; 2
-
- collate_keywords : order_statements 2
- | opt_statements order_statements 2
- ; 1
-
- opt_statements : opt_statements collating_symbols 2
- | opt_statements collating_elements 2
- | collating_symbols 1
- | collating_elements 1
- ; 1
-
- collating_symbols : 'collating-symbol' COLLSYMBOL EOL 1
- ; 1
-
- collating_elements : 'collating-element' COLLELEMENT 1
- 'from' '"' char_list '"' EOL 2
- ; 1
- 2
-
- order_statements : order_start collation_order order_end 1
- ; 1
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 110 2 Terminology and General Requirements
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- order_start : 'order_start' EOL 1
- | 'order_start' order_opts EOL 1
- ; 1
-
- order_opts : order_opts ';' order_opt 2
- | order_opt 2
- ; 1
-
- order_opt : order_opt ',' opt_word 2
- | opt_word 2
- ; 1
-
- opt_word : 'forward' | 'backward' | 'position' 2
- ; 1
-
- collation_order : collation_order collation_entry 2
- | collation_entry 2
- ; 1
-
- collation_entry : COLLSYMBOL EOL 1
- | collation_element weight_list EOL 1
- | collation_element EOL 2
- ; 1
-
- collation_element : char_symbol 1
- | COLLELEMENT 1
- | ELLIPSIS 1
- | 'UNDEFINED' 1
- ; 1
-
- weight_list : weight_list ';' weight_symbol 2
- | weight_list ';' 2
- | weight_symbol 2
- ; 1
-
- weight_symbol : char_symbol 2
- | COLLSYMBOL 1
- | '"' char_list '"' 1
- | ELLIPSIS 1
- | 'IGNORE' 1
- ; 1
-
- order_end : 'order_end' EOL 1
- ; 1
-
- collate_tlr : 'END' 'LC_COLLATE' EOL 2
- ; 1
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 2.5 Locale 111
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- /* The following is the LC_MESSAGES category grammar */ 1
-
- lc_messages : messages_hdr messages_keywords messages_tlr 2
- | messages_hdr 'copy' locale_name EOL messages_tlr 2
- ; 2
-
- messages_hdr : 'LC_MESSAGES' EOL 2
- ; 2
-
- messages_keywords : messages_keywords messages_keyword 2
- | messages_keyword 2
- ; 1
-
- messages_keyword : 'yesexpr' '"' EXTENDED_REG_EXP '"' EOL 2
- | 'noexpr' '"' EXTENDED_REG_EXP '"' EOL 2
- ; 2
-
- messages_tlr : 'END' 'LC_MESSAGES' EOL 2
- ; 1
-
- /* The following is the LC_MONETARY category grammar */ 1
-
- lc_monetary : monetary_hdr monetary_keywords monetary_tlr2
- | monetary_hdr 'copy' locale_name EOL monetary_tlr2
- ; 2
-
- monetary_hdr : 'LC_MONETARY' EOL 2
- ; 2
-
- monetary_keywords : monetary_keywords monetary_keyword 2
- | monetary_keyword 2
- ; 1
-
- monetary_keyword : mon_keyword_string mon_string EOL 1
- | mon_keyword_char NUMBER EOL 2
- | mon_keyword_char '-1' EOL 2
- | mon_keyword_grouping mon_group_list EOL 1
- ; 1
-
- mon_keyword_string : 'int_curr_symbol' | 'currency_symbol' 1
- | 'mon_decimal_point' | 'mon_thousands_sep' 1
- | 'positive_sign' | 'negative_sign' 1
- ; 1
-
- mon_string : '"' char_list '"' 1
- | '""' 1
- ; 1
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 112 2 Terminology and General Requirements
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- mon_keyword_char : 'int_frac_digits' | 'frac_digits' 1
- | 'p_cs_precedes' | 'p_sep_by_space' 1
- | 'n_cs_precedes' | 'n_sep_by_space' 1
- | 'p_sign_posn' | 'n_sign_posn' 1
- ; 1
- 2
-
- mon_keyword_grouping : 'mon_grouping' 1
- ; 1
-
- mon_group_list : NUMBER 2
- | mon_group_list ';' NUMBER 2
- ; 2
-
- monetary_tlr : 'END' 'LC_MONETARY' EOL 2
- ; 2
-
- /* The following is the LC_NUMERIC category grammar */ 2
-
- lc_numeric : numeric_hdr numeric_keywords numeric_tlr 2
- | numeric_hdr 'copy' locale_name EOL numeric_tlr 2
- ; 2
-
- numeric_hdr : 'LC_NUMERIC' EOL 2
- ; 2
-
- numeric_keywords : numeric_keywords numeric_keyword 2
- | numeric_keyword 2
- ; 1
-
- numeric_keyword : num_keyword_string num_string EOL 1
- | num_keyword_grouping num_group_list EOL 1
- ; 1
-
- num_keyword_string : 'decimal_point' 1
- | 'thousands_sep' 1
- ; 1
-
- num_string : '"' char_list '"' 1
- | '""' 1
- ; 1
-
- num_keyword_grouping : 'num_grouping' 1
- ; 1
-
- num_group_list : NUMBER 2
- | num_group_list ';' NUMBER 2
- ; 1
- 2
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 2.5 Locale 113
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- numeric_tlr : 'END' 'LC_NUMERIC' EOL 2
- ; 1
-
- /* The following is the LC_TIME category grammar */ 1
-
- lc_time : time_hdr time_keywords time_tlr 2
- | time_hdr 'copy' locale_name EOL time_tlr 2
- ; 1
-
- time_hdr : 'LC_TIME' EOL 2
- ; 1
-
- time_keywords : time_keywords time_keyword 2
- | time_keyword 2
- ; 1
-
- time_keyword : time_keyword_name time_list EOL 2
- | time_keyword_fmt time_string EOL 1
- | time_keyword_opt time_list EOL 1
- ; 1
-
- time_keyword_name : 'abday' | 'day' | 'abmon' | 'mon' 2
- ; 1
-
- time_keyword_fmt : 'd_t_fmt' | 'd_fmt' | 't_fmt' | 'am_pm' | 't_fmt_ampm'1
- ; 1
-
- time_keyword_opt : 'era' | 'era_year' | 'era_d_fmt' | 'alt_digits' 1
- ; 1
-
- time_list : time_list ';' time_string 2
- | time_string 2
- ; 1
-
- time_string : '"' char_list '"' 1
- ; 1
-
- time_tlr : 'END' 'LC_TIME' EOL 2
- ; 1
-
- BEGIN_RATIONALE 1
-
-
-
-
-
-
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 114 2 Terminology and General Requirements
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- 2.5.4 Locale Definition Example. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f
- _P_1_0_0_3._2)
-
- The following is an example of a locale definition file that could be
- used as input to the localedef utility. It assumes that the utility is
- executed with the -f option, naming a _c_h_a_r_m_a_p file with (at least) the
- following content:
-
- CHARMAP
- <space> \x20
- <dollar> \x24
- <A> \101
- <a> \141
- <A-acute> \346
- <a-acute> \365
- <A-grave> \300 1
- <a-grave> \366
- <b> \142
- <C> \103
- <c> \143
- <c-cedilla> \347
- <d> \x64
- <H> \110
- <h> \150
- <eszet> \xb7
- <s> \x73
- <z> \x7a
- END CHARMAP
-
- It should not be taken as complete or to represent any actual locale, but
- only to illustrate the syntax.
-
- A further set of examples is offered as part of Annex F.
-
- #
- LC_CTYPE
- lower <a>;<b>;<c>;<c-cedilla>;<d>;...;<z>
- upper A;B;C;C,;...;Z
- space \x20;\x09;\x0a;\x0b;\x0c;\x0d 1
- blank \040;\011
- toupper (<a>,<A>);(b,B);(c,C);(c,,C,);(d,D);(z,Z)
- END LC_CTYPE
- #
- LC_COLLATE
- #
- # The following example of collation is based on the proposed 1
- # Canadian standard Z243.4.1-1990, "Canadian Alphanumeric 1
- # Ordering Standard For Character sets of CSA Z234.4 Standard". 1
- # (Other parts of this example locale definition file do not 1
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 2.5 Locale 115
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- # purport to relate to Canada, or to any other real culture.) 1
- # The proposed standard defines a 4-weight collation, such that
- # in the first pass, characters are compared without regard to
- # case or accents; in second pass, backwards compare without
- # regard to case; in the third pass, forward compare without
- # regard to diacriticals. In the 3 first passes, non-alphabetic 2
- # characters are ignored; in the fourth pass, only special
- # characters are considered, such that "The string that has a
- # special character in the lowest position comes first. If two
- # strings have a special character in the same position, the
- # collation value of the special character determines ordering.
- #
- # Only a subset of the character set is used here; mostly to
- # illustrate the set-up.
- #
- 2
- #
- collating-symbol <LOW_VALUE> 2
- collating-symbol <LOWER-CASE>
- collating-symbol <SUBSCRIPT-LOWER>
- collating-symbol <SUPERSCRIPT-LOWER>
- collating-symbol <UPPER-CASE>
- collating-symbol <NO-ACCENT>
- collating-symbol <PECULIAR>
- collating-symbol <LIGATURE>
- collating-symbol <ACUTE>
- collating-symbol <GRAVE>
- # Further collating-symbols follow.
- #
- # Properly, the standard does not include any multi-character
- # collating elements; the one below is added for completeness.
- #
- collating_element <ch> from <c><h>
- collating_element <CH> from <C><H>
- collating_element <Ch> from <C><h>
- #
- order_start forward;backward;forward;forward,position
- #
- # Collating symbols are specified first in the sequence to allocate
- # basic collation values to them, lower that than of any character.
-
- <LOW_VALUE> 2
- <LOWER-CASE>
- <SUBSCRIPT-LOWER>
- <SUPERSCRIPT-LOWER>
- <UPPER-CASE>
- <NO-ACCENT>
- <PECULIAR>
- <LIGATURE>
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 116 2 Terminology and General Requirements
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- <ACUTE>
- <GRAVE>
- <RING-ABOVE>
- <DIAERESIS>
- <TILDE>
- # Further collating symbols are given a basic collating value here.
- #
- # Here follows special characters.
- <space> IGNORE;IGNORE;IGNORE;<space>
- # Other special characters follow here.
- #
- # Here comes the regular characters.
- <a> <a>;<NO-ACCENT>;<LOWER-CASE>;IGNORE
- <A> <a>;<NO-ACCENT>;<UPPER-CASE>;IGNORE
- <a-acute> <a>;<ACUTE>;<LOWER-CASE>;IGNORE
- <A-acute> <a>;<ACUTE>;<UPPER-CASE>;IGNORE
- <a-grave> <a>;<GRAVE>;<LOWER-CASE>;IGNORE
- <A-grave> <a>;<GRAVE>;<UPPER-CASE>;IGNORE
- <ae> <a><e>;<LIGATURE><LIGATURE>;<LOWER-CASE><LOWER-CASE>;IGNORE
- <AE> <a><e>;<LIGATURE><LIGATURE>;<UPPER-CASE><UPPER-CASE>;IGNORE
- <b> <b>;<NO-ACCENT>;<LOWER-CASE>;IGNORE
- <B> <b>;<NO-ACCENT>;<UPPER-CASE>;IGNORE
- <c> <c>;<NO-ACCENT>;<LOWER-CASE>;IGNORE
- <C> <c>;<NO-ACCENT>;<UPPER-CASE>;IGNORE
- <ch> <ch>;<NO-ACCENT>;<LOWER-CASE>;IGNORE
- <Ch> <ch>;<NO-ACCENT>;<PECULIAR>;IGNORE
- <CH> <ch>;<NO-ACCENT>;<UPPER-CASE>;IGNORE
- #
- # As an example, the strings "Bach" and "bach" could be encoded (for
- # compare purposes) as:
- # "Bach" <b>;<a>;<ch>;<LOW_VALUE>;<NO_ACCENT>;<NO_ACCENT>;\ 2
- # <NO_ACCENT>;<LOW_VALUE>;<UPPER>;<LOWER>;<LOWER>;<NULL> 2
- # "bach" <b>;<a>;<ch>;<LOW_VALUE>;<NO_ACCENT>;<NO_ACCENT>;\ 2
- # <NO_ACCENT>;<LOW_VALUE>;<LOWER>;<LOWER>;<LOWER>;<NULL> 2
- #
- # The two strings are equal in pass 1 and 2, but differ in pass 3.
- #
- # Further characters follow.
- #
- UNDEFINED IGNORE;IGNORE;IGNORE;IGNORE
- #
- order_end
- #
- END LC_COLLATE
- #
- LC_MONETARY
- int_curr_symbol "USD "
- currency_symbol "$"
- mon_decimal_point "."
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 2.5 Locale 117
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- mon_grouping 3;0
- positive_sign ""
- negative_sign "-"
- p_cs_precedes 1
- n_sign_posn 0
- END LC_MONETARY
- #
- LC_NUMERIC
- copy "US_en.ASCII" 1
- END LC_NUMERIC
- #
- LC_TIME
- abday "Sun";"Mon";"Tue";"Wed";"Thu";"Fri";"Sat"
- #
- day "Sunday";"Monday";"Tuesday";"Wednesday";\
- "Thursday";"Friday";"Saturday"
- #
- abmon "Jan";"Feb";"Mar";"Apr";"May";"Jun";\
- "Jul";"Aug";"Sep";"Oct";"Nov";"Dec"
- #
- mon "January";"February";"March";"April";\
- "May";"June";"July";"August";"September";\
- "October";"November";"December"
- #
- d_t_fmt "%a %b %d %T %Z %Y\n"
- END LC_TIME
- #
- LC_MESSAGES
- yesexpr "^([yY][[:alpha:]]*)|(OK)" 1
- #
- noexpr "^[nN][[:alpha:]]*" 1
- END LC_MESSAGES
-
- END_RATIONALE
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 118 2 Terminology and General Requirements
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- 2.6 Environment Variables
-
- Environment variables defined in this clause affect the operation of
- multiple utilities and applications. There are other environment
- variables that are of interest only to specific utilities. Environment
- variables that apply to a single utility only are defined as part of the
- utility description. See the Environment Variables subclause of the
- utility descriptions for information on environment variable usage.
-
- The value of an environment variable is a string of characters, as
- described in 2.7 in POSIX.1 {8}.
-
- Environment variable names used by the standard utilities shall consist
- solely of uppercase letters, digits, and the _ (underscore) from the
- characters defined in 2.4. The namespace of environment variable names
- containing lowercase letters shall be reserved for applications.
- Applications can define any environment variables with names from this
- namespace without modifying the behavior of the standard utilities.
-
- If the following variables are present in the environment during the
- execution of an application or utility, they are given the meaning
- described below. They may be put into the environment, or changed, by
- either the implementation or the user. If they are defined in the
- utility's environment, the standard utilities assume they have the
- specified meaning. Conforming applications shall not set these
- environment variables to have meanings other than as described. See 7.2
- and 3.12 for methods of accessing these variables.
-
- HOME A pathname of the user's home directory.
-
- LANG This variable shall determine the locale category for 1
- any category not specifically selected via a variable 1
- starting with LC_. LANG and the LC_ variables can be 1
- used by applications to determine the language for
- messages and instructions, collating sequences, date
- formats, etc. Additional semantics of this variable,
- if any, are implementation defined.
-
- LC_ALL This variable shall override the value of the LANG
- variable and the value of any of the other variables
- starting with LC_.
-
- LC_COLLATE This variable shall determine the locale category for
- character collation information within bracketed
- regular expressions and for sorting. This
- environment variable determines the behavior of
- ranges, equivalence classes, and multicharacter
- collating elements. Additional semantics of this
- variable, if any, are implementation defined.
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 2.6 Environment Variables 119
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- LC_CTYPE This variable shall determine the locale category for
- character handling functions. This environment
- variable shall determine the interpretation of
- sequences of bytes of text data as characters (e.g.,
- single- versus multibyte characters), the
- classification of characters (e.g., alpha, digit,
- graph), and the behavior of character classes.
- Additional semantics of this variable, if any, are
- implementation defined.
-
- LC_MESSAGES This variable shall determine the locale category for
- processing affirmative and negative responses and the
- language and cultural conventions in which messages
- should be written. Additional semantics of this
- variable, if any, are implementation defined. The
- language and cultural conventions of diagnostic and
- informative messages whose format is unspecified by
- this standard should be affected by the setting of
- LC_MESSAGES.
-
- LC_MONETARY This variable shall determine the locale category for
- monetary-related numeric formatting information.
- Additional semantics of this variable, if any, are
- implementation defined.
-
- LC_NUMERIC This variable shall determine the locale category for
- numeric formatting (for example, thousands separator
- and radix character) information. Additional
- semantics of this variable, if any, are
- implementation defined.
-
- LC_TIME This variable shall determine the locale category for
- date and time formatting information. Additional
- semantics of this variable, if any, are
- implementation defined.
-
- LOGNAME The user's login name.
-
- PATH The sequence of path prefixes that certain functions
- and utilities apply in searching for an executable
- file known only by a filename. The prefixes shall be
- separated by a colon (:). When a nonzero-length
- prefix is applied to this filename, a slash shall be
- inserted between the prefix and the filename. A
- zero-length prefix is an obsolescent feature that
- indicates the current working directory. It appears
- as two adjacent colons (::), as an initial colon
- preceding the rest of the list, or as a trailing
- colon following the rest of the list. A Strictly
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 120 2 Terminology and General Requirements
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- Conforming POSIX.2 Application shall use an actual
- pathname (such as '.') to represent the current
- working directory in PATH. The list shall be
- searched from beginning to end, applying the filename
- to each prefix, until an executable file with the
- specified name and appropriate execution permissions
- is found. If the pathname being sought contains a
- slash, the search through the path prefixes shall not
- be performed. If the pathname begins with a slash,
- the specified path shall be resolved as described in
- 2.2.2.104. If PATH is unset or is set to null, the
- path search is implementation-defined.
-
- SHELL A pathname of the user's preferred command language
- interpreter. If this interpreter does not conform to
- the shell command language in Section 3, utilities
- may behave differently than described in this
- standard.
-
- TMPDIR A pathname of a directory made available for programs
- that need a place to create temporary files.
-
- TERM The terminal type for which output is to be prepared.
- This information is used by utilities and application
- programs wishing to exploit special capabilities
- specific to a terminal. The format and allowable
- values of this environment variable are unspecified.
-
- TZ Time-zone information. The format is described in
- POSIX.1 {8} 8.1.1.
-
- The environment variables LANG, LC_ALL, LC_COLLATE, LC_CTYPE,
- LC_MESSAGES, LC_MONETARY, LC_NUMERIC, and LC_TIME (LC_*) provide for the
- support of internationalized applications. The standard utilities shall
- make use of these environment variables as described in this clause and
- the individual Environment Variables subclauses for the utilities. If
- these variables specify locale categories that are not based upon the
- same underlying code set, the results are unspecified.
-
- For utilities used in internationalized applications, if the LC_ALL is
- not set in the environment or is set to the empty string, and if any of
- LC_* variables is not set in the environment or is set to the empty
- string, the operational behavior of the utility for the corresponding
- locale category shall be determined by the setting of the LANG
- environment variable. If the LANG environment variable is not set or is
- set to the empty string, the implementation-defined default locale shall
- be used.
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 2.6 Environment Variables 121
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- If LANG (or any of the LC_* environment variables) contains the value
- "C", or the value "POSIX", the POSIX Locale shall be selected and the
- standard utilities shall behave in accordance with the rules in the 2.5.1
- for the associated category.
-
- If LANG (or any of the LC_* environment variables) begins with a slash,
- it shall be interpreted as the pathname of a file that was created in the
- output format used by the localedef utility; see 4.35.6.3. Referencing
- such a pathname shall result in that locale being used for the category
- indicated.
-
- If LANG (or any of the LC_* environment variables) contains one of a set
- of implementation-defined values, the standard utilities shall behave in
- accordance with the rules in a corresponding implementation-defined
- locale description for the associated category.
-
- If LANG (or any of the LC_* environment variables) contains a value that
- the implementation does not recognize, the behavior is unspecified.
-
- Additional criteria for determining a valid locale name are
- implementation defined.
-
- BEGIN_RATIONALE
-
-
- 2.6.1 Environment Variables Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f
- _P_1_0_0_3._2)
-
- The standard is worded so that the specified variables _m_a_y be provided to
- the application. There is no way that the implementation can guarantee
- that a utility will ever see an environment variable, as a parent process
- can change the environment for its children. The env -i command in this
- standard and the POSIX.1 {8} _e_x_e_c family both offer ways to remove any of
- these variables from the environment.
-
- The language about locale implies that any utilities written in Standard
- C and conforming to POSIX.2 must issue the following call:
-
- setlocale(LC_ALL, "")
-
- If this were omitted, the C Standard {7} specifies that the C Locale
- would be used.
-
- If any of the environment variables is invalid, it makes sense to default
- to an implementation-defined, consistent locale environment. It is more
- confusing for a user to have partial settings occur in case of a mistake.
- All utilities would then behave in one language/cultural environment.
- Furthermore, it provides a way of forcing the whole environment to be the
- implementation-defined default. Disastrous results could occur if a
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 122 2 Terminology and General Requirements
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- pipeline of utilities partially use the environment variables in
- different ways. In this case, it would be appropriate for utilities that
- use LANG and related variables to exit with an error if any of the
- variables are invalid. For example, users typing individual commands at
- a terminal might want date to work if LC_MONETARY is invalid as long as
- LC_TIME is valid. Since these are conflicting reasonable alternatives,
- POSIX.2 leaves the results unspecified if the locale environment
- variables would not produce a complete locale matching the user's
- specification.
-
- The locale settings of individual categories cannot be truly independent
- and still guarantee correct results. For example, when collating two
- strings, characters must first be extracted from each string (governed by
- LC_CTYPE) before being mapped to collating elements (governed by
- LC_COLLATE) for comparison. That is, if LC_CTYPE is causing parsing
- according to the rules of a large, multibyte code set (potentially
- returning 20000 or more distinct character code set values), but
- LC_COLLATE is set to handle only an 8-bit code set with 256 distinct
- characters, meaningful results are obviously impossible.
-
- The LC_MESSAGES variable affects the language of messages generated by
- the standard utilities. This standard does not provide a means whereby
- applications can easily be written to perform similar feats. Future
- versions of POSIX.1 {8} and POSIX.2 are expected to provide both
- functions and utilities to accomplish multilanguage messaging (using
- message catalogs), but such facilities were not ready for standardization
- at the time the initial versions of the standards were developed.
-
- This clause is not a full list of all environment variables, but only
- those of importance to multiple utilities. Nevertheless, to satisfy some
- members of the balloting group, here is a list of the other environment
- variable symbols mentioned in this standard:
-
- Variable Utility Variable Utility
- ________ _______ _________ _______
- CDPATH cd MAKEFLAGS make
- COLUMNS ls OPTARG getopts
- DEAD mailx OPTIND getopts
- IFS sh PRINTER lp 1
- LPDEST lp PS1 sh
- MAIL sh PS2 sh
- MAILRC mailx
-
- The description of PATH is similar to that in POSIX.1 {8}, except:
-
- - The behavior of a null prefix is marked obsolescent in favor of
- using a real pathname. This was done at the behest of some members
- of the balloting group, who apparently felt it offered a more
- secure environment, where the current directory would not be
- selected unintentionally.
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 2.6 Environment Variables 123
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- - The POSIX.1 {8} _e_x_e_c description requires an implementation-defined
- path search when PATH is ``not present.'' POSIX.2 spells out that
- this means ``unset or set to null.'' Many implementations
- historically have used a default value of /bin and /usr/bin.
- POSIX.2 does not mandate that this default path be identical to
- that retrieved from getconf _CS_PATH because it is likely that a
- transition to POSIX.2 conformance will see the newly-standardized
- utilities in another directory that needs to be isolated from some
- historical applications.
-
- - The POSIX.1 {8} PATH description is ambiguous about whether an
- ``executable file'' means one that has the appropriate permissions
- for the searching process to execute it. One reading would say
- that a file with any of the execution bits set on would satisfy the
- search and that an [EACCES] could be returned at that point. This
- is not the way historical systems work and POSIX.2 has clarified it
- to mean that the path search will continue until it finds the name
- with the execute permissions that would allow the process to
- execute it. (The case of the [ENOEXEC] error is handled in the
- text of 3.9.1.1.)
-
- The terminology ``beginning to end'' is used in PATH to avoid the
- noninternationalized ``left to right.'' There is no way to have a colon
- character embedded within a pathname that is part of the PATH variable
- string. Colon is not a member of the portable filename character set, so
- this should not be a problem. A portable application can retrieve a
- default PATH value (that will allow access to all the standard utilities)
- from the system using the command:
-
- getconf _CS_PATH
-
- See the rationale with command for an example of using this.
-
- The SHELL variable names the user's preferred shell; it is a guide to
- applications. There is no direct requirement that that shell conform to
- this standard--that decision should rest with the user. It is the
- intention of the developers of this standard that alternative shells be
- permitted, if the user chooses to develop or acquire one. An operating
- system that builds its shell into the ``kernel'' in such a manner that
- alternative shells would be impossible does not conform to the spirit of
- the standard.
-
- The following environment variables are not currently used by the
- standard utilities (although they may be by future UPE utilities).
- Implementations should reserve the names for the following purposes:
-
- EDITOR The name of the user's preferred text file editor. The
- value of this variable is the name of a utility: either a
- pathname containing a slash, or a filename to be located
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 124 2 Terminology and General Requirements
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- using the PATH environment variable.
-
- VISUAL The name of the user's preferred ``visual,'' or full-
- screen, text file editor. The value of this variable is
- the name of a utility: either a pathname containing a
- slash, or a filename to be located using the PATH
- environment variable.
-
- The decision to restrict conforming systems to the use of digits,
- uppercase letters, and underscores for environment variable names allows
- applications to use lowercase letters in their environment variable names
- without conflicting with any conforming system.
-
- PROCLANG was added to an earlier draft for internationalized
- applications, but was removed from the standard because the working group
- determined that it was not of use.
-
- USER was removed from an earlier draft because it was an unreasonable
- duplication of LOGNAME.
-
- END_RATIONALE
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 2.6 Environment Variables 125
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- 2.7 Required Files
-
- The following directories shall exist on conforming systems and shall be
- used as described. Strictly Conforming POSIX.2 Applications shall not
- assume the ability to create files in any of these directories.
-
- / The root directory.
-
- /dev Contains /dev/null and /dev/tty, described below.
-
- The following directory shall exist on conforming systems and shall be
- used as described.
-
- /tmp A directory made available for programs that need a place
- to create temporary files. Applications shall be allowed
- to create files in this directory, but shall not assume
- that such files are preserved between invocations of the
- application.
-
- The following files shall exist on conforming systems and shall be both
- readable and writable.
-
- /dev/null An infinite data source/sink. Data written to /dev/null
- is discarded. Reads from /dev/null always return end-of-
- file (EOF).
-
- /dev/tty In each process, a synonym for the controlling terminal
- associated with the process group of that process, if any.
- It is useful for programs or shell procedures that wish to
- be sure of writing messages to or reading data from the
- terminal no matter how output has been redirected.
-
- BEGIN_RATIONALE
-
-
- 2.7.1 Required Files Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f
- _P_1_0_0_3._2)
-
- A description of the historical /usr/tmp was omitted, removing any
- concept of differences in emphasis between the / and /usr versions. The
- descriptions of /bin, /usr/bin, /lib, and /usr/lib were omitted because
- they are not useful for applications. In an early draft, a distinction
- was made between _s_y_s_t_e_m and _a_p_p_l_i_c_a_t_i_o_n directory usage, but this was not
- found to be useful.
-
- In Draft 8, /, /dev, /local, /usr/local, and /usr/man were removed. The
- directories / and /dev were restored in Draft 9. It was pointed out by
- several balloters that the notion of a hierarchical directory structure
- is key to other information presented in later sections of the standard.
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 126 2 Terminology and General Requirements
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- (Previously, some had argued that special devices and temporary files
- could conceivably be handled without a directory structure on some
- implementations. For example, the system could treat the characters
- ``/tmp'' as a special token that would store files using some non-POSIX
- file system structure. This notion was rejected by the working group,
- which requires that all the files in this clause be implemented via POSIX
- file systems.)
-
- The /tmp directory is retained in the standard to accommodate historical
- applications that assume its availability. Future implementations are
- encouraged to provide suitable directory names in TMPDIR and future
- applications are encouraged to use the contents of TMPDIR for creating
- temporary files.
-
- The standard files /dev/null and /dev/tty are required to be both
- readable and writable to allow applications to have the intended
- historical access to these files.
-
- END_RATIONALE
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 2.7 Required Files 127
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- 2.8 Regular Expression Notation
-
- _E_d_i_t_o_r'_s _N_o_t_e: _T_h_e _e_n_t_i_r_e _r_a_t_i_o_n_a_l_e _f_o_r _t_h_i_s _c_l_a_u_s_e _a_p_p_e_a_r_s _a_t _t_h_e _e_n_d
- _o_f _t_h_e _c_l_a_u_s_e.
-
- _R_e_g_u_l_a_r _E_x_p_r_e_s_s_i_o_n_s (REs) provide a mechanism to select specific strings
- from a set of character strings.
-
- Regular expressions are a context-independent syntax that can represent a
- wide variety of character sets and character set orderings, where these
- character sets are interpreted according to the current locale. While
- many regular expressions can be interpreted differently depending on the
- current locale, many features, such as character class expressions,
- provide for contextual invariance across locales.
-
- The Basic Regular Expression (BRE) notation and construction rules in
- 2.8.3 shall apply to most utilities supporting regular expressions. Some
- utilities, instead, support the Extended Regular Expressions (ERE)
- described in 2.8.4; any exceptions for both cases are noted in the
- descriptions of the specific utilities using regular expressions. Both
- BREs and EREs are supported by the Regular Expression Matching interface
- in 7.3.
-
-
- 2.8.1 Regular Expression Definitions
-
- For the purposes of this clause, the following definitions apply.
-
-
- 2.8.1.1 entire regular expression: The concatenated set of one or more
- BREs or EREs that make up the pattern specified for string selection.
-
- 2.8.1.2 matched: A sequence of zero or more characters is said to be
- matched by a BRE or ERE when the characters in the sequence corresponds
- to a sequence of characters defined by the pattern.
-
- Matching shall be based on the bit pattern used for encoding the 1
- character, not on the graphic representation of the character. 1
-
- The search for a matching sequence shall start at the beginning of a
- string and stop when the first sequence matching the expression is found,
- where ``first'' is defined to mean ``begins earliest in the string.'' If
- the pattern permits a variable number of matching characters and thus
- there is more than one such sequence starting at that point, the longest 1
- such sequence shall be matched. For example: the BRE bb* matches the 1
- second through fourth characters of abbbc, and the ERE 1
- (wee|week)(knights|night) matches all ten characters of weeknights. 1
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 128 2 Terminology and General Requirements
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- Consistent with the whole match being the longest of the leftmost 1
- matches, each subpattern, from left to right, shall match the longest 1
- possible string. For this purpose, a null string shall be considered to 2
- be longer than no match at all. For example, matching the BRE \(.*\).* 2
- against abcdef, the subexpression (\1) is abcdef, and matching the BRE 2
- \(a*\)* against bc, the subexpression (\1) is the null string. 2
-
- When a multicharacter collating element in a bracket expression (see 1
- 2.8.3.2) is involved, the longest sequence shall be measured in 1
- characters consumed from the string to be matched; i.e., the collating 1
- element counts not as one element, but as the number of characters it 1
- matches. 1
-
-
- 2.8.1.3 BRE [ERE] matching a single character: A BRE or ERE that
- matches either a single character or a single collating element.
-
- Only a BRE or ERE of this type that includes a bracket expression (see 1
- 2.8.3.2) can match a collating element. 1
-
- 2.8.1.4 BRE [ERE] matching multiple characters: A BRE or ERE that
- matches a concatenation of single characters or collating elements.
-
- Such a BRE or ERE is made up from a _B_R_E (_E_R_E) _m_a_t_c_h_i_n_g _a _s_i_n_g_l_e _c_h_a_r_a_c_t_e_r
- and _B_R_E (_E_R_E) _s_p_e_c_i_a_l _c_h_a_r_a_c_t_e_rs. 1
-
-
- 2.8.2 Regular Expression General Requirements
-
- The requirements in this subclause shall apply to both basic and extended
- regular expressions.
-
- The use of regular expressions is generally associated with text
- processing; i.e., REs (BREs and EREs) operate on text strings; i.e., zero
- or more characters followed by an end-of-string delimiter (typically
- NUL). Some utilities employing regular expressions limit the processing
- to lines; i.e., zero or more characters followed by a <newline>. In the
- regular expression processing described in this standard, the <newline>
- character is regarded as an ordinary character. This standard specifies 1
- within the individual descriptions of those standard utilities employing 1
- regular expressions whether they permit matching of <newline>s; if not 1
- stated otherwise, the use of literal <newline>s or any escape sequence 1
- equivalent produces undefined results. 1
-
- The interfaces specified in this standard do not permit the inclusion of
- a NUL character in an RE or in the string to be matched. If during the
- operation of a standard utility a NUL is included in the text designated
- to be matched, that NUL may designate the end of the text string for the 1
- purposes of matching. 1
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 2.8 Regular Expression Notation 129
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- When a standard utility or function that uses regular expressions
- specifies that pattern matching shall be performed without regard to the
- case (upper- or lower-) of either data or patterns, then when each
- character in the string is matched against the pattern, not only the
- character, but also its case counterpart (if any), shall be matched.
-
- The implementation shall support any regular expression that does not
- exceed 256 bytes in length.
-
- This clause uses the term ``invalid'' for certain constructs or 1
- conditions. Invalid REs shall cause the utility or function using the RE 1
- to generate an error condition. When ``invalid'' is not used, violations 1
- of the specified syntax or semantics for REs produce undefined results: 1
- this may entail an error, enabling an extended syntax for that RE, or 1
- using the construct in error as literal characters to be matched. 1
-
-
- 2.8.3 Basic Regular Expressions
-
-
- 2.8.3.1 BREs Matching a Single Character or Collating Element
-
- A BRE ordinary character, a special character preceded by a backslash, or
- a period shall match a single character. A bracket expression shall
- match a single character or a single collating element.
-
- 2.8.3.1.1 BRE Ordinary Characters
-
- An ordinary character is a BRE that matches itself: any character in the
- supported character set, except for the BRE special characters listed in
- 2.8.3.1.2.
-
- The interpretation of an ordinary character preceded by a backslash (\)
- is undefined, except for:
-
- (1) The characters ), (, {, and }.
-
- (2) The digits 1 through 9 (see 2.8.3.3).
-
- (3) A character inside a bracket expression.
-
- 2.8.3.1.2 BRE Special Characters
-
- A _B_R_E _s_p_e_c_i_a_l _c_h_a_r_a_c_t_e_r has special properties in certain contexts. 1
- Outside of those contexts, or when preceded by a backslash, such a 1
- character shall be a BRE that matches the special character itself. The 1
- BRE special characters and the contexts in which they have their special
- meaning are:
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 130 2 Terminology and General Requirements
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- . [ \ The period, left-bracket, and backslash shall be special
- except when used in a bracket expression (see 2.8.3.2). An
- expression containing a [ that is not preceded by a backslash
- and is not part of a bracket expression produces undefined 1
- results. 1
-
- * The asterisk is special except when used
-
- - In a bracket expression, 1
-
- - As the first character of an entire BRE (after an initial 1
- ^, if any), or 1
-
- - As the first character of a subexpression (after an 1
- initial ^, if any); see 2.8.3.3. 1
-
- ^ The circumflex shall be special when used 1
-
- - As an anchor (see 2.8.3.5) or, 1
-
- - As the first character of a bracket expression (see 1
- 2.8.3.2). 1
-
- $ The dollar-sign shall be special when used as an anchor. 1
-
- 2.8.3.1.3 Periods in BREs
-
- A period (.), when used outside of a bracket expression, is a BRE that
- shall match any character in the supported character set except NUL. 1
-
-
- 2.8.3.2 RE Bracket Expression
-
- A bracket expression (an expression enclosed in square brackets, []) is
- an RE that matches a single collating element contained in the nonempty 1
- set of collating elements represented by the bracket expression. 1
-
- The following rules and definitions apply to bracket expressions:
-
- (1) A _b_r_a_c_k_e_t _e_x_p_r_e_s_s_i_o_n is either a matching list expression or a
- nonmatching list expression. It consists of one or more
- expressions: collating elements, collating symbols, equivalence 1
- classes, character classes, or range expressions. Strictly
- Conforming POSIX.2 Applications shall not use range expressions,
- but conforming implementations shall support regular expressions
- containing range expressions. The right-bracket (]) shall lose
- its special meaning and represent itself in a bracket expression
- if it occurs first in the list [after an initial circumflex (^),
- if any]. Otherwise, it shall terminate the bracket expression,
- unless it appears in a collating symbol (such as [.].]) or is 1
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 2.8 Regular Expression Notation 131
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- the ending right-bracket for a collating symbol, equivalence 1
- class, or character class). The special characters
-
- . * [ \
-
- (period, asterisk, left-bracket, and backslash, respectively)
- shall lose their special meaning within a bracket expression.
-
- The character sequences
-
- [. [= [:
-
- (left-bracket followed by a period, equals-sign, or colon) shall
- be special inside a bracket expression and are used to delimit
- collating symbols, equivalence class expressions, and character
- class expressions. These symbols shall be followed by a valid
- expression and the matching terminating sequence .], =], or :],
- as described in the following items.
-
- (2) A _m_a_t_c_h_i_n_g _l_i_s_t expression specifies a list that shall match any
- one of the expressions represented in the list. The first
- character in the list shall not be the circumflex. For example,
- [abc] is an RE that matches any of a, b, or c.
-
- (3) A _n_o_n_m_a_t_c_h_i_n_g _l_i_s_t expression begins with a circumflex (^), and
- specifies a list that shall match any character or collating
- element except for the expressions represented in the list after 1
- the leading circumflex. For example, [^abc] is an RE that
- matches any character or collating element except a, b, or c. 1
- The circumflex shall have this special meaning only when it
- occurs first in the list, immediately following the left-
- bracket.
-
- (4) A _c_o_l_l_a_t_i_n_g _s_y_m_b_o_l is a collating element enclosed within
- bracket-period ([. .]) delimiters. Collating elements are
- defined as described in 2.5.2.2.4. Multicharacter collating 1
- elements shall be represented as collating symbols when it is
- necessary to distinguish them from a list of the individual
- characters that make up the multicharacter collating element.
- For example, if the string ch is a collating element in the
- current collation sequence with the associated collating symbol
- <ch>, the expression [[.ch.]] shall be treated as an RE matching
- the character sequence ch, while [ch] shall be treated as an RE
- matching c or h. Collating symbols shall be recognized only 1
- inside bracket expressions. This implies that the RE [[.ch.]]*c
- shall match the first through fifth character in the string
- chchch. If the string is not a collating element in the current
- collating sequence definition, or if the collating element has 1
- no characters associated with it (e.g., see the symbol <HIGH> in 1
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 132 2 Terminology and General Requirements
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- the example collation definition shown in 2.5.2.2.4), the symbol 1
- shall be treated as an invalid expression. 1
-
- (5) An _e_q_u_i_v_a_l_e_n_c_e _c_l_a_s_s _e_x_p_r_e_s_s_i_o_n shall represent the set of
- collating elements belonging to an equivalence class, as 1
- described in 2.5.2.2.4. Only primary equivalence classes shall 1
- be recognized. The class shall be expressed by enclosing any
- one of the collating elements in the equivalence class within
- bracket-equal ([= =]) delimiters. For example, if a, a`, and a^
- belong to the same equivalence class, then [[=a=]b], [[=a`=]b],
- and [[=a^=]b] shall each be equivalent to [aa`a^b]. If the
- collating element does not belong to an equivalence class, the
- equivalence class expression shall be treated as a _c_o_l_l_a_t_i_n_g
- _s_y_m_b_o_l.
-
- (6) A _c_h_a_r_a_c_t_e_r _c_l_a_s_s _e_x_p_r_e_s_s_i_o_n shall represent the set of
- characters belonging to a character class, as defined in the
- LC_CTYPE category in the current locale. All character classes
- specified in the current locale shall be recognized. A
- character class expression shall be expressed as a character
- class name enclosed within ``bracket-colon'' ([: :]) delimiters.
-
- Strictly conforming POSIX.2 applications shall only use the
- following character class expressions, which shall be supported
- on all conforming implementations:
-
- [:alnum:] [:cntrl:] [:lower:] [:space:]
- [:alpha:] [:digit:] [:print:] [:upper:]
- [:blank:] [:graph:] [:punct:] [:xdigit:]
-
- (7) A _r_a_n_g_e _e_x_p_r_e_s_s_i_o_n represents the set of collating elements that
- fall between two elements in the current collation sequence, 1
- inclusively. It shall be expressed as the starting point and 1
- the ending point separated by a hyphen (-).
-
- Range expressions shall not be used in Strictly Conforming
- POSIX.2 Applications because their behavior is dependent on the
- collating sequence. Range expressions shall be supported by
- conforming implementations.
-
- In the following, all examples assume the collation sequence
- specified for the POSIX Locale, unless another collation
- sequence is specifically defined.
-
- The starting range point and the ending range point shall be a
- collating element or collating symbol. An equivalence class 2
- expression used as a starting or ending point of a range 2
- expression produces unspecified results. The ending range point 2
- shall collate equal to or higher than the starting range point; 2
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 2.8 Regular Expression Notation 133
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- otherwise the expression shall be treated as invalid. The order
- used is the order in which the collating elements are specified
- in the current collation definition. One-to-many mappings (see
- 2.5.2.2) shall not be performed. For example, assuming that the
- character eszet (B) is placed in the basic collation sequence
- after r and s, but before t, and that it maps to the sequence ss
- for collation purposes, then the expression [r-s] matches only r
- and s, but the expression [s-t] matches s, B, or t.
-
- The interpretation of range expressions where the ending range
- point also is the starting range point of a subsequent range
- expression is undefined.
-
- The hyphen character shall be treated as itself if it occurs
- first (after an initial ^, if any) or last in the list, or as an
- ending range point in a range expression. As examples, the
- expressions [-ac] and [ac-] are equivalent and match any of the
- characters a, c, or -; the expressions [^-ac] and [^ac-] are
- equivalent and match any characters except a, c, or -; the 1
- expression [%--] matches any of the characters between % and - 1
- inclusive; the expression [--@] matches any of the characters
- between - and @, inclusive; and the expression [a--@] is
- invalid, because the letter a follows the symbol - in the POSIX
- Locale. To use a hyphen as the starting range point, it shall
- either come first in the bracket expression or be specified as a
- collating symbol. For example: [][.-.]-0], which matches
- either a right bracket or any character or collating element 1
- that collates between hyphen and 0, inclusive. 1
-
-
- 2.8.3.3 BREs Matching Multiple Characters
-
- The following rules can be used to construct BREs matching multiple
- characters from BREs matching a single character:
-
- (1) The concatenation of BREs shall match the concatenation of the
- strings matched by each component of the BRE. 1
-
- (2) A _s_u_b_e_x_p_r_e_s_s_i_o_n can be defined within a BRE by enclosing it
- between the character pairs \( and \). Such a subexpression
- shall match whatever it would have matched without the \( and
- \), except that anchoring within subexpressions is optional 1
- behavior; see 2.8.3.5. Subexpressions can be arbitrarily 1
- nested. 1
-
- (3) The _b_a_c_k_r_e_f_e_r_e_n_c_e expression \_n shall match the same (possibly 1
- empty) string of characters as was matched by a subexpression 1
- enclosed between \( and \) preceding the \_n. The character _n
- shall be a digit from 1 through 9, specifying the _n-th
- subexpression [the one that begins with the _n-th \( and ends
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 134 2 Terminology and General Requirements
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- with the corresponding paired \)]. The expression is invalid if
- less than _n subexpressions precede the \_n. For example, the
- expression ^\(.*\)\1$ matches a line consisting of two adjacent
- appearances of the same string, and the expression \(a\)*\1 2
- fails to match a. 2
-
- (4) When a BRE matching a single character, a subexpression, or a 1
- backreference is followed by the special character asterisk (*), 1
- together with that asterisk it shall match what zero or more 2
- consecutive occurrences of the BRE would match. For example, 2
- [ab]* and [ab][ab] are equivalent when matching the string ab. 2
-
- (5) When a BRE matching a single character, a subexpression, or a 1
- backreference is followed by an _i_n_t_e_r_v_a_l _e_x_p_r_e_s_s_i_o_n of the 1
- format \{_m\}, \{_m,\}, or \{_m,_n\}, together with that interval 1
- expression it shall match what repeated consecutive occurrences 2
- of the BRE would match. The values of _m and _n shall be decimal 2
- integers in the range 0 _< _m _< _n _< {RE_DUP_MAX}, where _m 1
- specifies the exact or minimum number of occurrences and _n
- specifies the maximum number of occurrences. The expression
- \{_m\} shall match exactly _m occurrences of the preceding BRE,
- \{_m,\} shall match at least _m occurrences, and \{_m,_n\} shall
- match any number of occurrences between _m and _n, inclusive. 1
-
- For example, in the string abababccccccd the BRE c\{3\} is
- matched by characters seven through nine, the BRE \(ab\)\{4,\}
- is not matched at all, and the BRE c\{1,3\}d is matched by
- characters ten through thirteen.
-
- The behavior of multiple adjacent duplication symbols (* and intervals) 1
- produces undefined results. 1
-
-
- 2.8.3.4 BRE Precedence 1
-
- The order of precedence shall be as shown in Table 2-12, from high to 1
- low. 1
-
- 2.8.3.5 BRE Expression Anchoring
-
- A BRE can be limited to matching strings that begin or end a line; this 1
- is called _a_n_c_h_o_r_i_n_g. The circumflex and dollar-sign special characters 1
- shall be considered BRE anchors in the following contexts: 1
-
- (1) A circumflex (^) shall be an anchor when used as the first 1
- character of an entire BRE. The implementation may treat 1
- circumflex as an anchor when used as the first character of a 1
- subexpression. The circumflex shall anchor the expression (or 1
- optionally subexpression) to the beginning of a string; only 1
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 2.8 Regular Expression Notation 135
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
-
- Table 2-12 - BRE Precedence 1
- __________________________________________________________________________________________________________________________________________________ 1
-
- _c_o_l_l_a_t_i_o_n-_r_e_l_a_t_e_d _b_r_a_c_k_e_t _s_y_m_b_o_l_s [= =] [: :] [. .] 1
- _e_s_c_a_p_e_d _c_h_a_r_a_c_t_e_r_s \<_s_p_e_c_i_a_l _c_h_a_r_a_c_t_e_r> 1
- _b_r_a_c_k_e_t _e_x_p_r_e_s_s_i_o_n [ ] 1
- _s_u_b_e_x_p_r_e_s_s_i_o_n_s/_b_a_c_k_r_e_f_e_r_e_n_c_e_s \( \) \_n 1
- _s_i_n_g_l_e-_c_h_a_r_a_c_t_e_r-_B_R_E _d_u_p_l_i_c_a_t_i_o_n * \{_m,_n\} 1
- _c_o_n_c_a_t_e_n_a_t_i_o_n 1
- _a_n_c_h_o_r_i_n_g ^ $ 1
- __________________________________________________________________________________________________________________________________________________
-
-
- sequences starting at the first character of a string shall be 1
- matched by the BRE. For example, the BRE ^ab matches ab in the 1
- string abcdef, but fails to match in the string cdefab. The BRE 1
- \(^ab\) may match the former string. A portable BRE shall 1
- escape a leading circumflex in a subexpression to match a 1
- literal circumflex. 1
-
- (2) A dollar-sign ($) shall be an anchor when used as the last 1
- character of an entire BRE. The implementation may treat a 1
- dollar-sign as an anchor when used as the last character of a 1
- subexpression. The dollar-sign shall anchor the expression (or 1
- optionally subexpression) to the end of the string being 1
- matched; the dollar-sign can be said to match the ``end-of- 1
- string'' following the last character. 1
-
- (3) A BRE anchored by both ^ and $ shall match only an entire 2
- string. For example, the BRE ^abcdef$ matches strings
- consisting only of abcdef. 1
-
-
- 2.8.4 Extended Regular Expressions
-
- The _e_x_t_e_n_d_e_d _r_e_g_u_l_a_r _e_x_p_r_e_s_s_i_o_n (ERE) notation and construction rules
- shall apply to utilities defined as using extended regular expressions;
- any exceptions to the following rules are noted in the descriptions of
- the specific utilities using EREs.
-
-
- 2.8.4.1 EREs Matching a Single Character or Collating Element
-
- An ERE ordinary character, a special character preceded by a backslash, 1
- or a period shall match a single character. A bracket expression shall 1
- match a single character or a single collating element. An _E_R_E _m_a_t_c_h_i_n_g 1
- _a _s_i_n_g_l_e _c_h_a_r_a_c_t_e_r enclosed in parentheses shall match the same as the
- ERE without parentheses would have matched.
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 136 2 Terminology and General Requirements
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- 2.8.4.1.1 ERE Ordinary Characters
-
- An _o_r_d_i_n_a_r_y _c_h_a_r_a_c_t_e_r is an ERE that matches itself. An ordinary
- character is any character in the supported character set, except for the 2
- ERE special characters listed in 2.8.4.1.2. The interpretation of an 2
- ordinary character preceded by a backslash (\) is undefined.
-
- 2.8.4.1.2 ERE Special Characters
-
- An _E_R_E _s_p_e_c_i_a_l _c_h_a_r_a_c_t_e_r has special properties in certain contexts. 1
- Outside of those contexts, or when preceded by a backslash, such a 1
- character shall be an ERE that matches the special character itself. The
- extended regular expression special characters and the contexts in which
- they shall have their special meaning are:
-
- . [ \ ( The period, left-bracket, backslash, and left-parenthesis 1
- are special except when used in a bracket expression (see 1
- 2.8.3.2).
-
- * + ? { The asterisk, plus-sign, question-mark, and left-brace are
- special except when used in a bracket expression (see
- 2.8.3.2). Any of the following uses produce undefined 2
- results: 2
-
- - If these characters appear first in an ERE, or
- immediately following a vertical-line, circumflex, or
- left-parenthesis.
-
- - If a left-brace is not part of a valid interval 1
- expression. 1
-
- | The vertical-line is special except when used in a bracket
- expression (see 2.8.3.2). A vertical-line appearing first
- or last in an ERE, or immediately following a vertical-
- line or a left-parentheses, produces undefined results. 1
-
- ^ The circumflex shall be special when used 1
-
- - As an anchor (see 2.8.4.6) or, 1
-
- - As the first character of a bracket expression (see 1
- 2.8.3.2). 1
-
- $ The dollar-sign shall be special when used as an anchor. 1
-
-
-
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 2.8 Regular Expression Notation 137
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- 2.8.4.1.3 Periods in EREs
-
- A period (.), when used outside of a bracket expression, is an ERE that
- shall match any character in the supported character set except NUL. 1
-
-
- 2.8.4.2 ERE Bracket Expression
-
- The rules for ERE Bracket Expressions are the same as for Basic Regular
- Expressions; see 2.8.3.2.
-
- 2.8.4.3 EREs Matching Multiple Characters
-
- The following rules shall be used to construct EREs matching multiple
- characters from EREs matching a single character:
-
- (1) A _c_o_n_c_a_t_e_n_a_t_i_o_n _o_f _E_R_E_s shall match the concatenation of the
- character sequences matched by each component of the ERE. A 1
- concatenation of EREs enclosed in parentheses shall match
- whatever the concatenation without the parentheses matches. For
- example, both the ERE cd and the ERE (cd) are matched by the
- third and fourth character of the string abcdefabcdef.
-
- (2) When an ERE matching a single character, or a concatenation of 1
- EREs enclosed in parentheses is followed by the special 1
- character plus-sign (+), together with that plus-sign it shall 1
- match what one or more consecutive occurrences of the ERE would 2
- match. For example, the ERE b+(bc) matches the fourth through 2
- seventh characters in the string acabbbcde. And, [ab]+ and 2
- [ab][ab]* are equivalent. 2
-
- (3) When an ERE matching a single character, or a concatenation of 1
- EREs enclosed in parentheses is followed by the special 1
- character asterisk (*), together with that asterisk it shall 1
- match what zero or more consecutive occurrences of the ERE would 2
- match. For example, the ERE b*c matches the first character in
- the string cabbbcde, and the ERE b*cd matches the third through
- seventh characters in the string cabbbcdebbbbbbcdbc. And, [ab]* 2
- and [ab][ab] are equivalent when matching the string ab. 2
-
- (4) When an ERE matching a single character, or a concatenation of 1
- EREs enclosed in parentheses is followed by the special 1
- character question-mark (?), together with that question-mark it 1
- shall match what zero or one consecutive occurrences of the ERE 2
- would match. For example, the ERE b?c matches the second 2
- character in the string acabbbcde.
-
- (5) When an ERE matching a single character, or a concatenation of 1
- EREs enclosed in parentheses is followed by an _i_n_t_e_r_v_a_l 1
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 138 2 Terminology and General Requirements
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- _e_x_p_r_e_s_s_i_o_n of the format {_m}, {_m,}, or {_m,_n}, together with that 1
- interval expression it shall match what repeated consecutive 2
- occurrences of the ERE would match. The values of _m and _n shall 2
- be decimal integers in the range 0 _< _m _< _n _< {RE_DUP_MAX}, where 1
- _m specifies the exact or minimum number of occurrences and _n
- specifies the maximum number of occurrences. The expression {_m}
- shall match exactly _m occurrences of the preceding ERE, {_m,}
- shall match at least _m occurrences, and {_m,_n} shall match any
- number of occurrences between _m and _n, inclusive. 1
-
- For example, in the string abababccccccd the ERE c{3} is matched 1
- by characters seven through nine, and the ERE (ab){2,} is 2
- matched by characters one through six. 2
-
- The behavior of multiple adjacent duplication symbols (+, *, ?, and 1
- intervals) produces undefined results. 1
-
-
- 2.8.4.4 ERE Alternation
-
- Two EREs separated by the special character vertical-line (|) shall match
- a string that is matched by either. For example, the ERE a((bc)|d)
- matches the string abc and the string ad. Single characters, or
- expressions matching single characters, separated by the vertical bar and
- enclosed in parentheses, shall be treated as an ERE matching a single
- character. 1
-
- 2.8.4.5 ERE Precedence
-
- The order of precedence shall be as shown in Table 2-13, from high to 1
- low. 1
-
-
- Table 2-13 - ERE Precedence 1
- __________________________________________________________________________________________________________________________________________________ 1
-
- _c_o_l_l_a_t_i_o_n-_r_e_l_a_t_e_d _b_r_a_c_k_e_t _s_y_m_b_o_l_s [= =] [: :] [. .] 1
- _e_s_c_a_p_e_d _c_h_a_r_a_c_t_e_r_s \<_s_p_e_c_i_a_l _c_h_a_r_a_c_t_e_r> 1
- _b_r_a_c_k_e_t _e_x_p_r_e_s_s_i_o_n [ ] 1
- _g_r_o_u_p_i_n_g ( ) 1
- _s_i_n_g_l_e-_c_h_a_r_a_c_t_e_r-_E_R_E _d_u_p_l_i_c_a_t_i_o_n * + ? {_m,_n} 1
- _c_o_n_c_a_t_e_n_a_t_i_o_n 1
- _a_n_c_h_o_r_i_n_g ^ $ 1
- _a_l_t_e_r_n_a_t_i_o_n | 1
- __________________________________________________________________________________________________________________________________________________
-
-
- For example, the ERE abba|cde matches either the string abba or the 1
- string cde (because concatenation has a higher order of precedence than 1
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 2.8 Regular Expression Notation 139
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- alternation).
-
-
- 2.8.4.6 ERE Expression Anchoring
-
- An ERE can be limited to matching strings that begin or end a line; this 1
- is called _a_n_c_h_o_r_i_n_g. The circumflex and dollar-sign special characters 1
- shall be considered ERE anchors in the following contexts: 1
-
- (1) A circumflex (^) shall be an anchor when used anywhere outside a 1
- bracket expression. The circumflex shall anchor the 1
- (sub)expression to the beginning of a string; only sequences 1
- starting at the first character of a string shall be matched by 1
- the ERE. For example, the EREs ^ab and (^ab) match ab in the 1
- string abcdef, but fail to match in the string cdefab. 1
-
- (2) A dollar-sign ($) shall be an anchor when used anywhere outside 1
- a bracket expression. It shall anchor the expression to the end 1
- of the string being matched; the dollar-sign can be said to
- match the ``end-of-string'' following the last character.
-
- (3) An ERE anchored by both ^ and $ shall match only an entire 2
- string. For example, the EREs ^abcdef$ and (^abcdef$) match
- strings consisting only of abcdef.
-
-
- 2.8.5 Regular Expression Grammar
-
- Grammars describing the syntax of both basic and extended regular
- expressions are presented in this subclause. See the grammar conventions
- in 2.1.2.
-
- 2.8.5.1 BRE/ERE Grammar Lexical Conventions
-
- The lexical conventions for regular expressions shall be as described in
- this subclause.
-
- Except as noted, the longest possible token or delimiter beginning at a
- given point shall be recognized.
-
- The following tokens shall be processed (in addition to those string
- constants shown in the grammar):
-
- COLL_ELEM Shall be any single-character collating element,
- unless it is a META_CHAR.
-
- BACKREF (Applicable only to basic regular expressions.) Shall
- be the character string consisting of '\' followed by
- a single-digit numeral, 1 through 9. 1
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 140 2 Terminology and General Requirements
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- DUP_COUNT Shall represent a numeric constant. It shall be an
- integer in the range 0 _< DUP_COUNT _< {RE_DUP_MAX}. 1
- This token shall only be recognized when the context
- of the grammar requires it. At all other times,
- digits not preceded by '\' shall be treated as
- ORD_CHAR.
-
- META_CHAR Shall be one of the characters:
-
- ^ When found first in a bracket expression
-
- - When found anywhere but first (after an initial
- ^, if any) or last in a bracket expression, or
- as the ending range point in a range expression
-
- ] When found anywhere but first (after an initial
- ^, if any) in a bracket expression.
-
- L_ANCHOR (Applicable only to basic regular expressions.) Shall
- be the character ^ when it appears as the first
- character of a basic regular expression and when not 1
- QUOTED_CHAR. The ^ may be recognized as an anchor 1
- elsewhere; see 2.8.3.5. 1
-
- ORD_CHAR Shall be a character, other than one of the special 1
- characters in SPEC_CHAR. 1
-
- QUOTED_CHAR Shall be one of the character sequences: 1
-
- \^ \. \* \[ \$ \\ 1
-
- R_ANCHOR (Applicable only to basic regular expressions). Shall 1
- be the character $ when it appears as the last 1
- character of a basic regular expression and when not 1
- QUOTED_CHAR. The $ may be recognized as an anchor 1
- elsewhere; see 2.8.3.5. 1
-
- SPEC_CHAR For basic regular expressions, shall be one of the
- following special characters:
-
- . Anywhere outside bracket expressions
-
- \ Anywhere outside bracket expressions
-
- [ Anywhere outside bracket expressions
-
- ^ When an anchor; see 2.8.3.5 2
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 2.8 Regular Expression Notation 141
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- $ When an anchor; see 2.8.3.5 2
-
- * Anywhere except: first in an entire RE;
- anywhere in a bracket expression; directly
- following \(; directly following an anchoring
- ^.
-
- For extended regular expressions, shall be one of the
- following special characters found anywhere outside
- bracket expressions:
-
- ^ . [ $ ( ) | * + ? { \
-
- (The close-parenthesis shall be considered special in 2
- this context only if matched with a preceding open- 2
- parenthesis.) 2
-
-
- 2.8.5.2 RE and Bracket Expression Grammar
-
- This subclause presents the grammar for basic regular expressions,
- including the bracket expression grammar that is common to both BREs and
- EREs.
-
- %token ORD_CHAR QUOTED_CHAR SPEC_CHAR DUP_COUNT
-
- %token BACKREF L_ANCHOR R_ANCHOR
-
- %token Back_open_paren Back_close_paren
- /* '\(' '\)' */
-
- %token Back_open_brace Back_close_brace
- /* '\{' '\}' */
-
- /* The following tokens are for the Bracket Expression
- grammar common to both REs and EREs. */
-
- %token COLL_ELEM META_CHAR 1
-
- %token Open_equal Equal_close Open_dot Dot_close Open_colon Colon_close 1
- /* '[=' '=]' '[.' '.]' '[:' ':]' */ 1
-
- %token class_name
- /* class_name is a keyword to the LC_CTYPE locale category */
- /* (representing a character class) in the current locale */
- /* and is only recognized between [: and :] */
-
- %start basic_reg_exp
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 142 2 Terminology and General Requirements
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- %%
-
- /* --------------------------------------------
- Basic Regular Expression
- --------------------------------------------
- */
-
- basic_reg_exp : RE_expression
- | L_ANCHOR
- | R_ANCHOR
- | L_ANCHOR R_ANCHOR
- | L_ANCHOR RE_expression
- | RE_expression R_ANCHOR
- | L_ANCHOR RE_expression R_ANCHOR
- ;
-
- RE_expression : simple_RE
- | RE_expression simple_RE
- ;
-
- simple_RE : nondupl_RE
- | nondupl_RE RE_dupl_symbol 1
- ;
-
- nondupl_RE : one_character_RE
- | Back_open_paren RE_expression Back_close_paren
- | Back_open_paren Back_close_paren
- | BACKREF
- ;
-
- /* 1
- Note: This grammar does not permit L_ANCHOR or 1
- R_ANCHOR inside \( and \) (which implies that ^ and $ 1
- are ordinary characters). This reflects the semantic 1
- limits on the application, as noted in 2.8.3.5. 1
- Implementations are permitted to extend the language to 1
- interpret ^ and $ as anchors in these locations, and as 1
- such portable applications shall not use unescaped ^ 1
- and $ in positions inside \( and \) that might be 1
- interpreted as anchors. 1
- */ 1
-
- one_character_RE : ORD_CHAR
- | QUOTED_CHAR
- | '.'
- | bracket_expression
- ;
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 2.8 Regular Expression Notation 143
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- RE_dupl_symbol : '*'
- | Back_open_brace DUP_COUNT Back_close_brace
- | Back_open_brace DUP_COUNT ',' Back_close_brace
- | Back_open_brace DUP_COUNT ',' DUP_COUNT Back_close_brace
- ;
-
- /* --------------------------------------------
- Bracket Expression
- -------------------------------------------
- */
-
- bracket_expression : '[' matching_list ']'
- | '[' nonmatching_list ']'
- ;
-
- matching_list : bracket_list
- ;
-
- nonmatching_list : '^' bracket_list
- ;
-
- bracket_list : follow_list
- | follow_list '-' 1
- ;
-
- follow_list : expression_term
- | follow_list expression_term
- ;
-
- expression_term : single_expression
- | range_expression
- ;
-
- single_expression : end_range
- | character_class 1
- ;
-
- range_expression : start_range end_range
- | start_range '-'
- ;
-
- start_range : end_range '-'
- ;
-
-
-
-
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 144 2 Terminology and General Requirements
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- end_range : COLL_ELEM
- | collating_symbol
- 2
- ;
-
- collating_symbol : Open_dot COLL_ELEM Dot_close
- | Open_dot META_CHAR Dot_close
- ;
-
- equivalence_class : Open_equal COLL_ELEM Equal_close
- ;
-
- character_class : Open_colon class_name Colon_close 1
- ;
-
-
- 2.8.5.3 ERE Grammar
-
- This subclause presents the grammar for extended regular expressions,
- excluding the bracket expression grammar.
- NOTE: The bracket expression grammar and the associated %token lines are
- identical between BREs and EREs. It has been omitted from the ERE
- subclause to avoid unnecessary editorial duplication.
-
-
- %token ORD_CHAR QUOTED_CHAR SPEC_CHAR DUP_COUNT
-
- %start extended_reg_exp
-
- %%
-
- /* --------------------------------------------
- Extended Regular Expression
- --------------------------------------------
- */
-
- extended_reg_exp : anchored_ERE
- | nonanchored_ERE
- | extended_reg_exp '|' nonanchored_ERE
- | extended_reg_exp '|' anchored_ERE
- ;
-
-
-
-
-
-
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 2.8 Regular Expression Notation 145
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- anchored_ERE : '^' nonanchored_ERE
- | '^' nonanchored_ERE '$'
- | nonanchored_ERE '$'
- | '^'
- | '$'
- | '^' '$'
- ;
-
- nonanchored_ERE : ERE_expression
- | nonanchored_ERE ERE_expression
- ;
-
- ERE_expression : one_character_ERE
- | '(' extended_reg_exp ')'
- | ERE_expression ERE_dupl_symbol
- ;
-
- one_character_ERE : ORD_CHAR
- | '\' SPEC_CHAR
- | '.'
- | bracket_expression
- ;
-
- ERE_dupl_symbol : '*'
- | '+'
- | '?'
- | '{' DUP_COUNT '}'
- | '{' DUP_COUNT ',' '}'
- | '{' DUP_COUNT ',' DUP_COUNT '}'
- ;
-
- BEGIN_RATIONALE
-
-
- 2.8.6 Regular Expression Notation Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a
- _p_a_r_t _o_f _P_1_0_0_3._2)
-
- _E_d_i_t_o_r'_s _N_o_t_e: _S_o_m_e _o_f _t_h_e _t_e_x_t _a_n_d _h_e_a_d_i_n_g_s _o_f _t_h_i_s _r_a_t_i_o_n_a_l_e _h_a_v_e _b_e_e_n 1
- _r_e_a_r_r_a_n_g_e_d. _M_o_v_e_d _t_e_x_t _h_a_s _n_o_t _b_e_e_n _d_i_f_f_m_a_r_k_e_d _u_n_l_e_s_s _i_t _c_h_a_n_g_e_d. 1
-
- Rather than repeating the description of regular expressions for each
- utility supporting REs, the working group preferred a common,
- comprehensive description of regular expressions in one place. The most
- common behavior is described here, and exceptions or extensions to this
- are documented for the respective utilities, if appropriate.
-
- The Basic Regular Expression corresponds to the ed or historical grep
- type, and the Extended Regular Expression corresponds to the historical
- egrep type (now grep -E).
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 146 2 Terminology and General Requirements
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- The text is based on the ed description and substantially modified,
- primarily to aid developers and others in the understanding of the
- capabilities and limitations of regular expressions. Much of this was
- influenced by the internationalization requirements.
-
- It should be noted that the definitions in this clause do not cover the
- tr utility (see 4.64); the tr syntax does not employ regular expressions.
-
- The specification of regular expressions are particularly important to
- internationalization, because pattern matching operations are very basic
- operations in business and other operations. The syntax and rules of
- regular expressions are intended to be as intuitive as possible, to make
- them easy to understand and use. The historical rules and behavior do
- not provide that capability to non-English-language users, and does not
- provide the necessary support for commonly used characters and language
- constructs. It was necessary to provide extensions to the historical
- regular expression syntax and rules, to accommodate other languages.
- Such modifications were proposed by the UniForum Technical Committee
- Subcommittee on Internationalization and accepted by the working group.
- As they are limited to bracket expressions, the rationale for these
- modifications can be found in 2.8.6.3.2.
-
-
- 2.8.6.1 Regular Expression Definitions Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t
- _a _p_a_r_t _o_f _P_1_0_0_3._2)
-
- The definition of which sequence is matched when several are possible is
- based on the leftmost-longest rule historically used by deterministic 1
- recognizers. This rule is much easier to define and describe, and
- arguably more useful, than the first-match rule historically used by
- nondeterministic recognizers. It is thought that dependencies on the
- choice of rule are rare; carefully-contrived examples are needed to
- demonstrate the difference.
-
- A formal expression of the leftmost-longest rule is: 1
-
- The search is performed as if all possible suffixes of the
- string were tested for a prefix matching the pattern; the
- longest suffix containing a matching prefix is chosen, and
- the longest possible matching prefix of the chosen suffix is
- identified as the matching sequence.
-
- It is possible to determine what strings correspond to subexpressions by 1
- recursively applying the leftmost longest rule to each subexpression, but 1
- only with the proviso that the overall match is leftmost longest (see 1
- 2.8.1.2). For example, matching \(ac*\)c*d[ac]*\1 against acdacaaa 1
- should match acdacaaa (with \1=a); simply matching the longest match for 1
- \(ac*\) would yield \1=ac, but the overall match would be smaller 1
- (acdac). In principle, the implementation must examine every possible 1
- match and among those that yield the leftmost longest total matches, pick 1
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 2.8 Regular Expression Notation 147
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- the one that does the longest match for the leftmost subexpression and so 1
- on. Note that this means that matching by subexpressions is context 1
- dependent: a subexpression within a larger RE may match a different 1
- string from the one it would match as an independent RE, and two 1
- instances of the same subexpression within the same larger RE may match 1
- different lengths even in similar sequences of characters. For example, 1
- in the ERE (a.*b)(a.*b), the two identical subexpressions would match 1
- four and six characters, respectively, of accbaccccb. Thus, it is not 1
- possible to hierarchically decompose the matching problem into smaller, 1
- independent, matching problems. 1
-
- Matching is based on the bit pattern used for encoding the character, not
- on the graphic representation of the character. This means that if a
- character set contains two or more encodings for a graphic symbol, or if
- the strings searched contain text encoded in more than one code set, no
- attempt is made to search for any other representation of the encoded
- symbol. If that is required, the user can specify equivalence classes
- containing all variations of the desired graphic symbol.
-
- The definition of ``single character'' has been expanded to include also
- collating elements consisting of two or more characters; this expansion 1
- is applicable only when a bracket expression is included in the BRE or 1
- ERE. An example of such a collating element may be the Dutch ``ij'', 1
- which collates as a ``y.'' In some encodings, a ligature ``i with j''
- exists _a_s _a _c_h_a_r_a_c_t_e_r, and would represent a single-character collating
- element. In another encoding, no such ligature exists, and the two-
- character sequence ``ij'' is defined as a multicharacter collating
- element. Outside brackets, the ``ij'' is treated as a two-character RE
- and will match the same characters in a string. Historically, a bracket
- expression only matched a single character. If, however, the bracket
- expression defines, for example, a range that includes ``ij'', then this
- particular bracket expression will also match a sequence of the two
- characters ``i'' and ``j'' in the string.
-
-
- 2.8.6.2 Regular Expression General Requirements Rationale. (_T_h_i_s
- _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2)
-
- Historically, most regular expression implementations only match lines,
- not strings. However, that is more an effect of the usage than of an
- inherent feature of regular expressions itself. Consequently, POSIX.2
- does not regard <newline>s as special; they are ordinary characters, and
- both a period and a nonmatching list can match them. Those utilities
- (like grep) that do not allow <newline>s to match are responsible for
- eliminating any <newline> from strings before matching against the RE.
- The _r_e_g_c_o_m_p() function, however, can provide support for such processing
- without violating the rules of this clause.
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 148 2 Terminology and General Requirements
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- The definition of case-insensitive processing is intended to allow
- matching of multicharacter collating elements as well as characters. For
- instance, as each character in the string is matched using both its
- cases, the RE [[.Ch.]], when matched against char, is in reality matched
- against ch, Ch, cH, and CH. 1
-
- Some implementations of egrep have had very limited flexibility in
- handling complex extended regular expressions. POSIX.2 does not attempt
- to define the complexity of a BRE or ERE, but does place a lower limit on
- it--any regular expression must be handled, as long as it can be
- expressed in 256 bytes or less. (Of course, this does not place an upper
- limit on the implementation.) There are existing programs using a
- nondeterministic-recognizer implementation that should have no difficulty
- with this limit. It is possible that a good approach would be to attempt
- to use the faster, but more limited, deterministic recognizer for simple
- expressions and to fall back on the nondeterministic recognizer for those
- expressions requiring it. Nondeterministic implementations must be
- careful to observe the 2.8.1.2 rules on which match is chosen; the
- longest match, not the first match, starting at a given character is
- used.
-
- The term ``invalid'' highlights a difference between this clause and some 1
- others: POSIX.2 frequently avoids mandating of errors for syntax 1
- violations because they can be used by implementors to trigger 1
- extensions. However, the authors of the internationalization features of 1
- regular expressions desired to mandate errors for certain conditions to 1
- identify usage problems or nonportable constructs. These are identified 1
- within this rationale as appropriate. The remaining syntax violations 1
- have been left implicitly or explicitly undefined. For example, the BRE 1
- construct \{1,2,3\} does not comply with the grammar. A conforming 1
- application cannot rely on it producing an error nor matching the literal 1
- characters \{1,2,3\}. The term ``undefined'' was used in favor of 1
- ``unspecified'' because many of the situations are considered errors on 1
- some implementations and it was felt that consistency throughout the 1
- clause was preferable to mixing undefined and unspecified. 1
-
-
- 2.8.6.3 Basic Regular Expressions Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a
- _p_a_r_t _o_f _P_1_0_0_3._2)
-
- 2.8.6.3.1 BREs Matching a Single Character or Collating Element
- Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2)
-
-
-
-
-
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 2.8 Regular Expression Notation 149
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- 2.8.6.3.2 RE Bracket Expression Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t
- _o_f _P_1_0_0_3._2)
-
- If a bracket expression must specify both - and ], then the ] must be
- placed first (after the ^, if any) and the - last within the bracket
- expression.
-
- Range expressions are, historically, an integral part of regular
- expressions. However, the requirements of ``natural language behavior''
- and portability does conflict: ranges must be treated according to the
- current collating sequence, and include such characters that fall within
- the range based on that collating sequence, regardless of character
- values. This, however, means that the interpretation will differ
- depending on collating sequence. If, for instance, one collating
- sequence defines ``a'..' as a variant of ``a'', while another defines it as
- a letter following ``z'', then the expression [a-..z] is valid in the first
- language and invalid in the second. This kind of ambiguity should be
- avoided in portable applications, and therefore the working group elected
- to state that ranges must not be used in strictly conforming
- applications; however, implementations must support them.
-
- Some historical implementations allow range expressions where the ending
- range point of one range is also the starting point of the next (for
- instance [a-m-o]). This behavior should not be permitted, but to avoid
- breaking existing implementations, it is now _u_n_d_e_f_i_n_e_d whether it is a
- valid expression, and how it should be interpreted.
-
- Current practice in awk and lex is to accept escape sequences in bracket
- expressions as per Table 2-15, while the normal regular expression
- behavior is to regard such a sequence as consisting of two characters.
- Allowing the awk/lex behavior in regular expressions would change the
- normal behavior in an unacceptable way; it is expected that awk and lex
- will decode escape sequences in regular expressions before passing them
- to _r_e_g_c_o_m_p() or comparable routines. Each utility describes the escape
- sequences it accepts as an exception to the rules in this clause; the
- list is not the same, for historical reasons.
-
- As noted earlier, the new syntax and rules have been added to accommodate
- other languages than English. These modifications were proposed by the
- UniForum Subcommittee on Internationalization and accepted by the working
- group. The remainder of this clause describes the rationale for these
- modifications.
-
- _I_n_t_e_r_n_a_t_i_o_n_a_l_i_z_a_t_i_o_n__R_e_q_u_i_r_e_m_e_n_t_s
-
- The goal of the internationalization effort was to provide functions and
- capabilities that matched the capabilities of existing implementations,
- but that adhered to the user's local customs, rules, and environment.
- This has also been described as ``removing the ASCII (and English
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 150 2 Terminology and General Requirements
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- language) bias.''
-
- In addition, other requirements also influence the standardization
- efforts, such as _p_o_r_t_a_b_i_l_i_t_y, _e_x_t_e_n_s_i_b_i_l_i_t_y, and _c_o_m_p_a_t_i_b_i_l_i_t_y.
-
- In a worldwide environment _p_o_r_t_a_b_i_l_i_t_y carries much weight. Wherever
- feasible, users should be given the capability to develop code that can
- execute independently of character set, code set, or language.
-
- Standards must also be _e_x_t_e_n_s_i_b_l_e; to support further development, to
- allow for local or regional extensions, or to accommodate new concepts
- (such as multibyte characters).
-
- _C_o_m_p_a_t_i_b_i_l_i_t_y does not only refer to support of existing code, but also
- to making the new syntax, semantics, and functions compatible with
- existing environments and implementations.
-
- _I_n_t_e_r_n_a_t_i_o_n_a_l_i_z_a_t_i_o_n__T_e_c_h_n_i_c_a_l__B_a_c_k_g_r_o_u_n_d
-
- The C Standard {7} (and, by implication, also POSIX) recognizes that the
- ASCII character set used in historical UNIX system implementations is not
- adequate outside the Anglo-American language area. It is, however, not
- enough to remove the ASCII bias; the dependency on Anglo-Saxon
- conventions and rules must also be broadened to accommodate other
- cultures, including those that require thousands of characters.
-
- Character sets are defined by their _a_t_t_r_i_b_u_t_e_s; typical attributes are
- the _e_n_c_o_d_i_n_g, the _c_o_l_l_a_t_i_n_g _s_e_q_u_e_n_c_e, the _c_h_a_r_a_c_t_e_r _c_l_a_s_s_i_f_i_c_a_t_i_o_n, and
- the _c_a_s_e _m_a_p_p_i_n_g.
-
- It is also recognized that, even within one language area, several
- combinations of attributes exist: character set attributes are _m_u_t_a_b_l_e
- and _c_o_m_b_i_n_a_t_o_r_y. So, rather than replacing one straitjacket by another,
- the proposed standards make character sets _u_s_e_r-_d_e_f_i_n_a_b_l_e and _p_r_o_g_r_a_m-
- _s_e_l_e_c_t_a_b_l_e.
-
- The existence of character set attributes is implicit in regular
- expressions (REs). This implies that regular expressions must recognize
- and adapt to the _p_r_o_g_r_a_m-_s_e_l_e_c_t_e_d set of attributes.
-
- A program _s_e_l_e_c_t_s the appropriate character set (or combination of
- attributes) using the mechanism described in 2.5. The _d_e_f_i_n_i_t_i_o_n of a
- character set (its attributes) is _e_x_t_e_r_n_a_l to an executing program. Many
- combinations of attributes can exist concurrently. Of particular
- interest are the following attributes:
-
- (1) _C_o_l_l_a_t_i_n_g _S_e_q_u_e_n_c_e. In existing implementations, the _e_n_c_o_d_e_d
- ASCII ordering matches the _l_o_g_i_c_a_l English collating sequence.
- This correspondence does not exist for all code sets or
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 2.8 Regular Expression Notation 151
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- languages. In addition, many languages employ concepts that
- have no counterparts in English collation:
-
- (a) In many languages, ordering is based on the concept of
- _s_t_r_i_n_g _c_o_l_l_a_t_i_o_n rather than _c_h_a_r_a_c_t_e_r _c_o_l_l_a_t_i_o_n as in
- English. One of the effects of this is that the ordering
- is based on _c_o_l_l_a_t_i_n_g _e_l_e_m_e_n_t_s rather than on characters.
- Characters typically map into collating elements:
-
- _O_n_e-_t_o-_o_n_e mapping, where a character is also a
- collating element,
-
- _O_n_e-_t_o-_N mapping, where a single character maps into
- two or more collating elements (as the German ``B''
- (eszet), which collates as ``ss''),
-
- _N-_t_o-_o_n_e mapping, where two or more characters map into
- one collating element (as in the Spanish ``ll'',
- which collates between ``l'' and ``m''; i.e., a word
- beginning with ``ll'' collates _a_f_t_e_r a word beginning
- with ``lo'').
-
- (b) A common method for adding characters to an alphabet is to
- use diacritical marks, such as accents or circumflex
- ( ^). In some languages, this creates a completely new
- c`h'aracter, collated differently from the Latin ``base.''
- In other languages these accented characters are collated
- as variants of the Latin base letter; i.e., they have the
- same relative order; they are _e_q_u_i_v_a_l_e_n_t.
-
- If the strings (words) being compared are equal except for
- ``accents,'' the strings can be ordered based on a
- secondary ordering _w_i_t_h_i_n the ``equivalence class.'' For
- instance, in French, the words ``_t_a_c_h_e'', ``_t_^a_c_h_e'', and
- ``_t_a_c_h_e_t_e_r'' collate in that order.
-
- The C Standard {7} recognizes this; it includes new library
- functions capable of handling complex collation rules. These
- functions depend on the setting of the _s_e_t_l_o_c_a_l_e() category
- LC_COLLATE for a definition of the current collation rules.
-
- (2) _C_h_a_r_a_c_t_e_r _C_l_a_s_s_i_f_i_c_a_t_i_o_n. Character classification and case
- mapping is another area where each language (or even language
- area) has its own rules. Although users in different countries
- can use the same code set, such as ISO 8859-1 {5}, the
- definition of what constitutes a letter or an uppercase letter
- may vary.
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 152 2 Terminology and General Requirements
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- The C Standard {7} recognizes this; library functions used to
- classify characters or perform case mapping depend on the
- _s_e_t_l_o_c_a_l_e() category LC_CTYPE for a definition of how characters
- map to character classes.
-
- _I_n_t_e_r_n_a_t_i_o_n_a_l_i_z_a_t_i_o_n__P_r_o_p_o_s_a_l__A_r_e_a_s
-
- Based on the requirements and attribute characteristics defined above,
- and after reviewing proposals and definitions by X/Open and other
- organizations, the UniForum Subcommittee on Internationalization decided
- to concentrate on the following areas: the range expression, character
- classes, the definition of one-character RE (multicharacter element), and
- equivalence classes.
-
- Most of these are heavily dependent on the current definition of
- collation sequence; the Subcommittee felt it natural to couple the
- capabilities and interpretation of bracket expressions closely to the
- requirements for extended collation capabilities.
-
- In addition, the Subcommittee felt that the capabilities described in 2.5
- formed a suitable basis for runtime control of regular expression
- behavior.
-
- The Subcommittee realized that the mechanism selected requires changes in
- the existing syntax. As a rule, the Subcommittee wished to minimize
- changes and avoid syntactical changes that may cause existing regular
- expressions to fail.
-
- (1) _C_o_l_l_a_t_i_n_g _E_l_e_m_e_n_t_s _a_n_d _S_y_m_b_o_l_s. As noted above, many
- expressions within a bracket expression are closely connected
- with collation, and the Subcommittee defined many capabilities
- in terms of collating elements and collating symbols.
-
- A collating element is defined as a sequence of one or more
- bytes defined in the current collating sequence definition as a
- unit of collation. In most cases, a collating element is equal
- to a character, but the collation sequence may exclude some
- characters, or define two or more characters as a collating
- element.
-
- A one-character RE is, logically enough, defined as one
- character or something that translates into one character (the
- number of bits used to represent the character is not an issue
- here). The expression within square brackets is a one-character
- RE; i.e., single characters are matched against the list of
- single characters defined within the brackets.
-
- In Spanish, the phrase ``a _t_o _d'' means the sequence of
- collating elements a, a', b, c, ch, and d. Consequently, with a
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 2.8 Regular Expression Notation 153
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- Spanish character set, the range statement [a-d] includes the ch
- collating element, even though it is expressed with two
- characters (N-to-1 mapping).
-
- The historical syntax, however, does not allow the user to
- define either the range from a through ch, or to define ch as a
- single character rather than as either c or h.
-
- The Subcommittee decided that N-to-1 mappings be recognized (if
- properly delimited), as _o_n_e-_c_h_a_r_a_c_t_e_r _R_E_s inside, but not
- outside, square brackets (e.g., a period will never match ch).
-
- To be distinguishable from a list of the characters themselves,
- the multicharacter element must be delimited from the remainder
- of the characters in the string. The characters [. _a_n_d .] are
- used to delimit a multicharacter collating element from other
- elements, and can be used to delimit single-character collating
- elements.
-
- (2) _E_q_u_i_v_a_l_e_n_c_e _C_l_a_s_s_e_s. As stated previously, many languages
- extend the Latin alphabet by using diacritical marks. In some
- cases, the Latin base character (e.g., a) and the accented
- versions of the base (e.g., a`, a^ in French) constitute a
- ``subclass'' of characters with some partially equivalent
- characteristics but different code values. Because these
- characters are related, they are often processed as a group.
- The historical syntax, however, does not provide for this in a
- portable manner.
-
- Although it represents an extension of the historical
- capabilities, the X/Open group strongly recommended that a
- properly delimited collating element be recognized as
- representing an equivalence class, that is as the collating
- element itself, and all other characters with the same primary
- order in the collation sequence.
-
- The Subcommittee supported this recommendation, and also
- selected [= and =] as delimiters for equivalence classes.
-
- (3) _R_a_n_g_e _E_x_p_r_e_s_s_i_o_n_s. The hyphen historically indicated ``a range
- of consecutive ASCII characters;'' typically it stands for the
- word ``to,'' as in ``a to z,'' _a_n_d _i_m_p_l_i_e_s _a_n _o_r_d_e_r_e_d _i_n_t_e_r_v_a_l.
- _I_n _A_S_C_I_I, _t_h_e _e_n_c_o_d_e_d _o_r_d_e_r _m_a_t_c_h_e_s _t_h_e _l_o_g_i_c_a_l _E_n_g_l_i_s_h _o_r_d_e_r;
- _t_h_i_s _i_s _n_o_t _t_r_u_e _w_i_t_h _o_t_h_e_r _e_n_c_o_d_i_n_g_s _o_r _w_i_t_h _o_t_h_e_r _a_l_p_h_a_b_e_t_s.
-
- _I_f _t_h_e _A_S_C_I_I _d_e_p_e_n_d_e_n_c_y _i_s _r_e_m_o_v_e_d, _a_n _a_l_t_e_r_n_a_t_i_v_e _c_o_u_l_d _h_a_v_e
- _b_e_e_n _t_o _u_s_e _t_h_e _e_n_c_o_d_e_d _s_e_q_u_e_n_c_e _o_f _w_h_a_t_e_v_e_r _c_o_d_e _s_e_t _i_s
- _c_u_r_r_e_n_t_l_y _u_s_e_d. _T_h_i_s, _h_o_w_e_v_e_r, _w_o_u_l_d _c_e_r_t_a_i_n_l_y _d_e_c_r_e_a_s_e
- _p_o_r_t_a_b_i_l_i_t_y, _a_s _w_e_l_l _a_s _r_e_q_u_i_r_i_n_g _t_h_e _u_s_e_r _t_o _k_n_o_w _t_h_e _o_r_d_e_r_i_n_g
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 154 2 Terminology and General Requirements
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- _o_f _t_h_e _c_u_r_r_e_n_t _c_o_d_e _s_e_t. _I_t _w_o_u_l_d _a_l_s_o _m_o_s_t _c_e_r_t_a_i_n_l_y _b_e
- _c_o_u_n_t_e_r-_i_n_t_u_i_t_i_v_e; _a _F_r_e_n_c_h _u_s_e_r _w_o_u_l_d _e_x_p_e_c_t _t_h_e _e_x_p_r_e_s_s_i_o_n
- [_a-_d] to match any of the letters a, a` a^, b, c, c, or d. The
- Subcommittee regards this interpretation of ranges as most
- compatible with existing capabilities, and one that provides for
- the desired portability.
-
- As the _l_o_g_i_c_a_l ordering need not be inherent in the _e_n_c_o_d_e_d
- sequence, an external definition was required. Such a
- definition was already present via the _c_o_l_l_a_t_i_n_g _s_e_q_u_e_n_c_e
- attribute of the character set. The _s_e_t_l_o_c_a_l_e() function
- provides for an LC_COLLATE category, which defines the current
- collating sequence. The Subcommittee selected this as the basis
- for the interpretation of ranges, as well as of equivalence
- classes and multicharacter collating symbols.
-
- (4) _C_h_a_r_a_c_t_e_r _C_l_a_s_s_e_s. The _r_a_n_g_e expression is commonly used to
- indicate a _c_h_a_r_a_c_t_e_r _c_l_a_s_s; the _e_x(_a_u__c_m_d) section of the _S_V_I_D
- states: ``... _a _p_a_i_r _o_f _c_h_a_r_a_c_t_e_r_s _s_e_p_a_r_a_t_e_d _b_y - defines a
- range (e.g., a-z defines any lowercase letter)....'' In
- reality, [a-z] means ``any lowercase letter between a and z,
- inclusive.'' This is _o_n_l_y equivalent to ``any lowercase
- letter'' if the _a is the first and z is the last lowercase
- letter in the collating sequence.
-
- To provide the intended capabilities in a portable way, the
- Subcommittee introduced a new syntactical element, namely an
- explicit _c_h_a_r_a_c_t_e_r _c_l_a_s_s. The definition of which characters
- constitute a specific character class is already present via the
- LC_CTYPE category of the _s_e_t_l_o_c_a_l_e() function.
-
- The Subcommittee selected the identification of character
- classes by _n_a_m_e, bracketed by [: and :]. A character class
- cannot be used as an endpoint in a range statement.
-
- _I_n_t_e_r_n_a_t_i_o_n_a_l_i_z_a_t_i_o_n__S_y_n_t_a_x
-
- The Subcommittee was careful to propose changes in the regular expression
- syntax that minimize the impact on existing REs. In evaluating
- alternatives, the Subcommittee looked at ease of use (terseness, ease to
- remember, keyboard availability), impact on historical REs
- (compatibility), implementability, performance and how error-prone the
- syntax is likely to be (ambiguity).
-
- The Subcommittee made the following evaluation:
-
- (1) Syntax changes must be limited to expressions within square
- brackets.
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 2.8 Regular Expression Notation 155
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- (2) Strings or characters with special meaning must be delimited
- from ordinary strings, to avoid compatibility problems.
-
- (3) Both initial and terminating delimiter should consist of two
- characters, to minimize compatibility and ambiguity problems.
-
- (4) Outer delimiter character should be bracketing; i.e., naturally
- indicate initial and terminating side. Examples: {} <> ().
-
- (5) The brackets ([]) are, due to the special rules for ``brackets
- within brackets,'' rather unlikely to be used in the intended
- way (a closing bracket must precede an open bracket in the
- existing syntax).
-
- (6) To minimize ambiguity, brackets must be paired with another
- character. Many other symbols are already in use, either within
- regular expressions, or in the shell. Examples of usable
- characters are: = . :
-
- (7) Because a multicharacter collating element also can be a member
- of an equivalence class, different delimiters must be chosen for
- these two expressions. Also, the character class expression
- must be distinguishable from, e.g., multicharacter collating
- symbols; although no historical example is known to the
- Subcommittee, prudence dictated that character classes be given
- separate delimiters.
-
- (8) The Subcommittee selected the period as the secondary delimiter
- for multicharacter collating symbols.
-
- (9) The Subcommittee selected the equals-sign as the secondary
- delimiter for equivalence classes.
-
- (10) The Subcommittee selected the colon as the secondary delimiter
- for character classes.
-
- The specific syntax and facilities described in this clause represent a
- coalescence of proposals and implementations from several vendors. Due
- to differences in facilities and syntax, it was not possible to take one
- implementation and codify it. There are now several implementations
- closely patterned on the existing proposal.
-
- The facilities presented in this clause are described in a manner that
- does not preclude their use with multibyte character sets. However, no
- attempt has been made to include facilities specifically intended for
- such character sets.
-
- The definitions of character classes is tied to the LC_CTYPE definition.
- The set of character classes defined in the C Standard {7} represents the
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 156 2 Terminology and General Requirements
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- minimum set of character classes required worldwide, i.e., those required
- by all implementations. It is the working group's belief that local
- standards bodies, as well as individual vendors, will provide extensions
- to the standard in these areas, for instance to provide, for example,
- Kanji character classes.
-
- In many historical implementations, an _i_n_v_a_l_i_d _r_a_n_g_e is treated as if it
- consisted of the endpoints only. For example, [z-a] is treated as [za].
- Some implementations treat the above range as [z], and others as [-az].
- Neither is correct, and the working group decided that this should be
- treated as an error.
-
- It was proposed that the syntax for bracket expressions be simplified
- such that the ``extra'' brackets are not needed if the bracket expression
- only consists of a character class, an equivalence class, or a collating
- symbol: ``[:alpha:]'' instead of ``[[:alpha:]]''. To ensure
- unambiguity, if a bracket expression starts with :, =, or ., then it
- cannot contain a class expression or a collating symbol (or duplicated
- characters). In addition, it was also proposed that only valid class or
- collating symbol expressions be accepted: e.g., [[:ctrl:]] is an invalid
- expression. The working group rejected the proposal. While the syntax
- [:alpha:] may be intuitive to some, the proposal does not allow, e.g.,
- [:digit:.ch.]. The alternative, to require additional brackets for the
- latter case would probably cause more errors than the historical syntax.
- Requiring erroneous class expressions or collating symbols to make the
- regular expression invalid may minimize the risks for inadvertent
- spelling errors. However, at this point it was judged that this would
- reduce consensus.
-
- Consideration was given to eliminating the [.ch.] syntax and providing
- that collating element should be recognized as such both inside and
- outside bracket expressions. In addition, consideration was given to
- defining character classes such that collating elements are included.
- The working group rejected these proposals. The [.ch.] syntax is only
- required inside bracket expressions due to the fact that a bracket
- expression historically only matched a single character. If ch is a
- collating element, a range [a-z] (if ``ch'' falls within it) matches ch.
- Outside brackets, an expression ch is treated as two concatenated
- characters, matching the string ``ch''. The [.ch.] expression is
- intended to allow the specification of a multicharacter collating element
- separately from ranges in a bracket expression. Character classes are
- not intended to include collating elements; there is no requirement that
- all characters in a multicharacter collating element belong to the same
- character class (for instance ``Ch'' is ``alpha'' but neither ``upper''
- nor ''lower''). Introducing collating elements in character classes
- would be nonintuitive.
-
- It was suggested that, because ranges may or may not be meaningful (or
- even accepted) based on the current collating sequence, they should be
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 2.8 Regular Expression Notation 157
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- eliminated from the syntax (or at least marked obsolescent). It was
- suggested that, e.g., [z-a] should always be or never be an error,
- regardless of collating sequence. The working group did not wish to
- eliminate ranges from the syntax. While it is true that ranges may not
- be universally portable, they are nevertheless a useful and fundamental
- construct in regular expressions. The regular expression syntax has
- consciously been extended to provide both increased portability and
- extended local capabilities. Where supported, ranges must reflect the
- current collating sequence. The working group instead elected to include
- range expressions as an implementation requirement, but state that
- strictly conforming applications (but not, e.g., National-Body-conforming
- applications) shall not use range expressions. Treating erroneous ranges
- as invalid points out that these may not be portable across collating
- sequences; and is better than (silently) making them behave in a way
- contrary to the intents of the user.
-
- Earlier drafts allowed the use of an equivalence class expression as the 2
- starting or ending point of a range expression, such as [[=e=]-f]. This 2
- now produces unspecified results because it is possible to define the 2
- equivalence class as a disjoint set of characters. This example could 2
- produce different results on various systems: 2
-
- - An error. 2
-
- - The equivalent of [[=e=]e-f] (which is the correct portable way to 2
- include equivalence class effects in a bracket expression). 2
-
- - All of the collating elements from the lowest value found in the 2
- equivalence class, including any of the elements found between the 2
- disjoint values. 2
-
- Consideration was given to saying that equivalence classes with disjoint 2
- elements produce unspecified results at the start or end of a range, but 2
- since the application cannot predict which equivalence classes are 2
- disjoint, this is no improvement over the more general statement chosen. 2
-
- It was suggested that, while reference to nonprintable characters is
- partially supported by the proposed set of character classes, the
- specificity is not precise enough, and that additional character classes
- should be supported, e.g., [:tab:] or [:a:]. The working group rejected
- this proposal, because this feature would represent a substantial
- enhancement to the current regular expression syntax, and one that cannot
- be based on internationalization requirements. It is judged that its
- inclusion would reduce consensus. A future revision of regular
- expressions should study the capability to create temporary character
- classes for use in regular expressions; a ``character class macro
- facility.''
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 158 2 Terminology and General Requirements
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- 2.8.6.3.3 BREs Matching Multiple Characters Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e
- _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2)
-
- The limit of nine backreferences to subexpressions in the RE is based on
- the use of a single digit identifier; increasing this to multiple digits
- would break historical applications. This does not imply that only nine 1
- subexpressions are allowed in REs. The following is a valid BRE with ten 1
- subexpressions: 1
-
- \(\(\(ab\)*c\)*d\)\(ef\)*\(gh\)\{2\}\(ij\)*\(kl\)*\(mn\)*\(op\)*\(qr\)* 1
-
- The working group regards the common current behavior, which supports
- \_n*, but not \_n\{_m_i_n,_m_a_x\}, or \(...\)*, or \(...\)\{_m_i_n,_m_a_x\}, as a
- nonintentional result of a specific implementation, and supports both
- duplication and interval expressions following subexpressions and
- backreferences.
-
- 2.8.6.3.4 Expression Anchoring Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t
- _o_f _P_1_0_0_3._2)
-
- Often, the dollar-sign is viewed as matching the ending <newline> in text
- files. This is not strictly true; the <newline> is typically eliminated
- from the strings to be matched and the dollar-sign matches the
- terminating null character.
-
- The ability of ^, $, and * to be nonspecial in certain circumstances may 1
- be confusing to some programmers, but this situation was changed only in 1
- a minor way from historical practice to avoid breaking many existing 1
- scripts. Some consideration was given to making the use of the anchoring 1
- characters undefined if not escaped and not at the beginning or end of 1
- strings. This would cause a number of historical BREs, such as 2^10, 1
- $HOME, and $1.35, which relied on the characters being treated literally, 1
- to become invalid. 1
-
- However, one relatively uncommon case was changed to allow an extension 1
- used on some implementations. Historically, the BREs ^foo and \(^foo\) 1
- did not match the same string, despite the general rule that 1
- subexpressions and entire BREs match the same strings. To achieve 1
- balloting consensus, POSIX.2 has allowed an extension on some systems to 1
- treat these two cases in the same way by declaring that anchoring _m_a_y 1
- occur at the beginning or end of a subexpression. Therefore, portable 1
- BREs that require a literal circumflex at the beginning or a dollar-sign 1
- at the end of a subexpression must escape them. Note that a BRE such as 1
- a\(^bc\) will either match a^bc or nothing on different systems under the 1
- POSIX.2 rules. 1
-
- ERE anchoring has been different from BRE anchoring in all historical 1
- systems. An unescaped anchor character has never matched its literal 1
- counterpart outside of a bracket expression. Some systems treated 1
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 2.8 Regular Expression Notation 159
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- foo$bar as a valid expression that never matched anything, others treated 1
- it as invalid. POSIX.2 mandates the former, valid unmatched behavior. 1
-
- Some systems have extended the BRE syntax to add alternation. For 1
- example, the subexpression \(foo$\|bar\) would match either foo at the 1
- end of the string or bar anywhere. The extension is triggered by the use 1
- of the undefined \| sequence. Because the BRE is undefined for portable 1
- scripts, the extending system is free to make other assumptions, such as 1
- that the $ represents the end-of-line anchor in the middle of a 1
- subexpression. If it were not for the extension, the $ would match a 1
- literal dollar-sign under the POSIX.2 rules. 1
-
-
- 2.8.6.4 Extended Regular Expressions Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a
- _p_a_r_t _o_f _P_1_0_0_3._2)
-
- As with basic regular expressions, the working group decided to make the
- interpretation of escaped ordinary characters undefined.
-
- The right-parenthesis is not listed as an ERE special character because 1
- it is only special in the context of a preceding left-parenthesis. If 1
- found without a preceding left-parenthesis, the right-parenthesis has no 1
- special meaning. 1
-
- Based on objections in several ballots, the _i_n_t_e_r_v_a_l _e_x_p_r_e_s_s_i_o_n, {_m,_n},
- has been added to extended regular expressions. Historically, the
- interval expression has only been supported in some extended regular
- expression implementations. The working group estimated that the
- addition of interval expressions to extended regular expressions would
- not decrease consensus, and would also make basic regular expressions
- more of a subset of extended regular expressions than in many historical
- implementations.
-
- It was suggested that, in addition to interval expressions,
- backreferences (\_n) also should be added to extended regular expressions.
- This was rejected by the working group as likely to decrease consensus.
-
- In historical implementations, multiple duplication symbols are usually
- interpreted from left to right and treated as additive. As an example,
- a+*b matches zero or more instances of a followed by a b. In POSIX.2,
- multiple duplication symbols are undefined; i.e., they cannot be relied
- upon for portable applications. One reason for this is to provide some
- scope for future enhancements; the current syntax is very crowded.
-
- The precedence of operations differs between EREs and those in lex; in
- lex, for historical reasons, interval expressions have a lower precedence
- than concatenation.
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 160 2 Terminology and General Requirements
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- 2.8.6.5 Regular Expression Grammar Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a
- _p_a_r_t _o_f _P_1_0_0_3._2)
-
- None.
-
- END_RATIONALE
-
-
-
- 2.9 Dependencies on Other Standards
-
-
- 2.9.1 Features Inherited from POSIX.1
-
- This subclause describes some of the features provided by POSIX.1 {8}
- that are assumed to be globally available by all systems conforming to
- POSIX.2. This subclause does not attempt to detail all of the
- POSIX.1 {8} features that are required by all of the utilities and
- functions defined in this standard; the utility and function descriptions
- point out additional functionality required to provide the corresponding
- specific features needed by each.
-
- The following subclauses describe frequently used concepts. Utility and
- function description statements override these defaults when appropriate.
-
- BEGIN_RATIONALE
-
- 2.9.1.0.1 Features Inherited from POSIX.1 Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s
- _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2)
-
- It has been pointed out that POSIX.2 assumes that a lot of POSIX.1 {8}
- functionality is present, but never states exactly how much. This is an
- attempt to clarify the assumptions.
-
- This subclause only covers the ``utilities and functions defined by this
- standard.'' It does not mandate that the specific POSIX.1 {8} interfaces
- themselves be available to all application programs. A C language
- program compiled on a POSIX.2 system is not guaranteed that any of the
- POSIX.1 {8} functions are accessible. (For example, although UNIX
- system-based implementations of ls will use _s_t_a_t() to get file status, a
- POSIX.2 implementation of ls on a ``LONG_NAME_OS-based'' implementation
- might use the _g_e_t__f_i_l_e__a_t_t_r_i_b_u_t_e_s() and the _g_e_t__f_i_l_e__t_i_m_e__s_t_a_m_p_s() system
- calls.) POSIX.2 only requires equivalent functionality, not equal means
- of access. In any event, programs requiring the POSIX.1 {8} system
- interface should specify that they need POSIX.1 {8} conformance and not
- hope to achieve it by piggybacking on POSIX.2.
-
- END_RATIONALE
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 2.9 Dependencies on Other Standards 161
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- 2.9.1.1 Process Attributes
-
- The following process attributes, as described in POSIX.1 {8}, are
- assumed to be supported for all processes in POSIX.2:
-
- controlling terminal real group ID
- current working directory real user ID
- effective group ID root directory
- effective user ID saved set-group-ID
- file descriptors saved set-user-ID
- file mode creation mask session membership
- process ID supplementary group IDs
- process group ID
-
- A conforming implementation may include additional process attributes.
-
- BEGIN_RATIONALE
-
- 2.9.1.1.1 Process Attributes Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f
- _P_1_0_0_3._2)
-
- The supplementary group IDs requirement is minimal. If {NGROUPS_MAX} is
- defined to be zero, they are not required. If {NGROUPS_MAX} is greater
- than zero, the supplementary group IDs are used as described in
- POSIX.1 {8} in various permission checking operations.
-
- The saved-set-group-ID and saved-set-user-ID requirements are also
- minimal. If {_POSIX_SAVED_IDS} is defined, they are required; otherwise,
- they are not.
-
- A controlling terminal is needed to control access to /dev/tty.
-
- The file creation semantics of POSIX.2 require the effective group ID,
- effective user ID, and the file mode creation mask.
-
- Pathname resolution and access permission checks require the current
- working directory, effective group ID, effective user ID, and root
- directory.
-
- The kill utility requires the effective group ID, effective user ID,
- process ID, process group ID, real group ID, real user ID, saved set-
- group-ID, saved set-user-ID, and session membership attributes to perform
- the various signal addressing and permission checks.
-
- The id utility is based on the effective group ID, effective user ID,
- real group ID, real user ID, and supplementary group IDs.
-
- The following process attributes described in POSIX.1 {8} do not seem to
- be required by POSIX.2: parent process ID, pending signals, process
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 162 2 Terminology and General Requirements
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- signal mask, time left until an alarm clock signal, _t_m_s__c_s_t_i_m_e,
- _t_m_s__c_u_t_i_m_e, _t_m_s__s_t_i_m_e, and _t_m_s__u_t_i_m_e. There are probably other
- attributes mentioned in POSIX.1 {8} that are not listed here.
-
- END_RATIONALE
-
-
- 2.9.1.2 Concurrent Execution of Processes
-
- The following functionality of the POSIX.1 {8} _f_o_r_k() function shall be
- available on all POSIX.2 conformant systems:
-
- (1) Independent processes shall be capable of executing
- independently without either process terminating.
-
- (2) A process shall be able to create a new process with all of the
- attributes referenced in 2.9.1.1, determined according to the
- semantics of a call to the POSIX.1 {8} _f_o_r_k() function followed
- by a call in the child process to one of the POSIX.1 {8} _e_x_e_c
- functions.
-
- BEGIN_RATIONALE
-
- 2.9.1.2.1 Concurrent Execution of Processes Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e
- _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2)
-
- The historical functionality of _f_o_r_k() is required, which permits the
- concurrent execution of independent processes. A system with a single
- thread of process execution is not an appropriate base upon which to
- build a POSIX.2 system. (This requirement was not explicitly stated in
- the 1988 POSIX.1, but is included in the current POSIX.1 {8}.)
-
- END_RATIONALE
-
- 2.9.1.3 File Access Permissions
-
- The file access control mechanism described by _f_i_l_e _a_c_c_e_s_s _p_e_r_m_i_s_s_i_o_n_s in
- 2.2.2.55 applies to all files on a conforming POSIX.2 implementation.
-
- BEGIN_RATIONALE
-
- 2.9.1.3.1 File Access Permissions Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a
- _p_a_r_t _o_f _P_1_0_0_3._2)
-
- The entire concept of file protections and access control is assumed to
- be handled as in POSIX.1 {8}.
-
- END_RATIONALE
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 2.9 Dependencies on Other Standards 163
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- 2.9.1.4 File Read, Write, and Creation
-
- When a file is to be read or written, the file shall be opened with an
- access mode corresponding to the operation to be performed. If file
- access permissions deny access, the requested operation shall fail.
-
- When a file that does not exist is created, the following POSIX.1 {8}
- features shall apply unless the utility or function description states
- otherwise:
-
- (1) The file's user ID is set to the effective user ID of the
- calling process.
-
- (2) The file's group ID is set to the effective group ID of the
- calling process or the group ID of the directory in which the
- file is being created.
-
- (3) The file's permission bits are set to:
-
- S_IROTH | S_IWOTH | S_IRGRP | S_IWGRP | S_IRUSR | S_IWUSR
-
- (see POSIX.1 {8} 5.6.1.2) except that the bits specified by the
- process's file mode creation mask are cleared.
-
- (4) The _s_t__a_t_i_m_e, _s_t__c_t_i_m_e, and _s_t__m_t_i_m_e fields of the file shall be
- updated as specified in _f_i_l_e _t_i_m_e_s _u_p_d_a_t_e in 2.2.2.69.
-
- (5) If the file is a directory, it shall be an empty directory;
- otherwise the file shall have length zero.
-
- (6) Unless otherwise specified, the file created shall be a regular
- file.
-
- When an attempt is made to create a file that already exists, the action
- shall depend on the file type:
-
- (1) For directories and FIFO special files, the attempt shall fail
- and the utility shall either continue with its operation or exit
- immediately with a nonzero status, depending on the description
- of the utility.
-
- (2) For regular files:
-
- (a) The file's user ID, group ID, and permission bits shall
- not be changed.
-
- (b) The file shall be truncated to zero length.
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 164 2 Terminology and General Requirements
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- (c) The _s_t__c_t_i_m_e and _s_t__m_t_i_m_e fields shall be marked for
- update.
-
- (3) For other file types, the effect is implementation defined.
-
- When a file is to be appended, the file shall be opened in a manner
- equivalent to using the O_APPEND flag, without the O_TRUNC flag, in the
- POSIX.1 {8} _o_p_e_n() call.
-
- BEGIN_RATIONALE
-
- 2.9.1.4.1 File Read, Write, and Creation Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s
- _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2)
-
- Even though it might be possible for a process to change the mode of a
- file to match a requested operation and change the mode back to its
- original state after the operation is completed, utilities are not
- allowed to do this unless the utility description states otherwise. As
- an example, the ed utility r command fails if the file to be read does
- not exist (even though it could create the file and then read it) or the
- file permissions do not allow read access [even though it could use the
- POSIX.1 {8} _c_h_m_o_d() function to make the file readable before attempting
- to open the file].
-
- END_RATIONALE
-
-
- 2.9.1.5 File Removal
-
- When a directory that is the root directory or current working directory
- of any process is removed, the effect is implementation defined. If file
- access permissions deny access, the requested operation shall fail.
- Otherwise, when a file is removed:
-
- (1) Its directory entry shall be removed from the file system.
-
- (2) The link count of the file shall be decremented.
-
- (3) If the file is an empty directory (see 2.2.2.43):
-
- (a) If no process has the directory open, the space occupied
- by the directory shall be freed and the directory shall no
- longer be accessible.
-
- (b) If one or more processes have the directory open, the
- directory contents shall be preserved until all references
- to the file have been closed.
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 2.9 Dependencies on Other Standards 165
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- (4) If the file is a directory that is not empty, the _s_t__c_t_i_m_e field
- shall be marked for update.
-
- (5) If the file is not a directory:
-
- (a) If the link count becomes zero:
-
- [1] If no process has the file open, the space occupied
- by the file shall be freed and the file shall no
- longer be accessible.
-
- [2] If one or more processes have the file open, the
- file contents shall be preserved until all
- references to the file have been closed.
-
- (b) If the link count is not reduced to zero, the _s_t__c_t_i_m_e
- field shall be marked for update.
-
- (6) The _s_t__c_t_i_m_e and _s_t__m_t_i_m_e fields of the containing directory
- shall be marked for update.
-
- BEGIN_RATIONALE
-
- 2.9.1.5.1 File Removal Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f
- _P_1_0_0_3._2)
-
- This is intended to be a summary of the POSIX.1 {8} _u_n_l_i_n_k() and _r_m_d_i_r()
- requirements needed by POSIX.2.
-
- END_RATIONALE
-
-
- 2.9.1.6 File Time Values
-
- All files have the three time values described by _f_i_l_e _t_i_m_e_s _u_p_d_a_t_e in
- 2.2.2.69.
-
- BEGIN_RATIONALE
-
- 2.9.1.6.1 File Time Values Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f
- _P_1_0_0_3._2)
-
- All three time stamps specified by POSIX.1 {8} are needed for utilities
- like find, ls, make, test, and touch to work as expected.
-
- END_RATIONALE
-
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 166 2 Terminology and General Requirements
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- 2.9.1.7 File Contents
-
- When a reference is made to the contents of a file, _p_a_t_h_n_a_m_e, this means
- the equivalent of all of the data placed in the space pointed to by _b_u_f
- when performing the _r_e_a_d() function calls in the following POSIX.1 {8}
- operations:
-
- while (read (fildes, buf, nbytes) > 0)
- ;
-
- If the file is indicated by a pathname _p_a_t_h_n_a_m_e, the file descriptor
- shall be determined by the equivalent of the following POSIX.1 operation:
-
- fildes = open (pathname, O_RDONLY);
-
- The value of _n_b_y_t_e_s in the above sequence is unspecified; if the file is
- of a type where the data returned by _r_e_a_d() would vary with different
- values, the value shall be one that results in the most data being
- returned.
-
- If the _r_e_a_d() function calls would return an error, it is unspecified
- whether the contents of the file are considered to include any data from
- offsets in the file beyond where the error would be returned.
-
- BEGIN_RATIONALE
-
- 2.9.1.7.1 File Contents Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f
- _P_1_0_0_3._2)
-
- This description is intended to convey the traditional behavior for all
- types of files. This matches the intuitive meaning for regular files,
- but the meaning is not always intuitive for other types of files. In
- particular, for FIFOs, pipes, and terminals it must be clear that the
- contents are not necessarily static at the time a file is opened, but
- they include the data returned by a sequence of reads until end-of-file
- is indicated. This is why the _o_p_e_n() call is specified, with the
- O_NONBLOCK flag not set.
-
- Some files, especially character special files, are sensitive to the size
- of a _r_e_a_d() request. The contents of the file are those resulting from
- proper choice of this size.
-
- END_RATIONALE
-
-
-
-
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 2.9 Dependencies on Other Standards 167
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- 2.9.1.8 Pathname Resolution
-
- The pathname resolution algorithm described by _p_a_t_h_n_a_m_e _r_e_s_o_l_u_t_i_o_n in
- 2.2.2.104 shall be used by conforming POSIX.2 implementations. See also
- _f_i_l_e _h_i_e_r_a_r_c_h_y in 2.2.2.58.
-
- BEGIN_RATIONALE
-
- 2.9.1.8.1 Pathname Resolution Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t
- _o_f _P_1_0_0_3._2)
-
- The whole concept of hierarchical file systems and pathname resolution is
- assumed to be handled as in POSIX.1 {8}.
-
- END_RATIONALE
-
-
- 2.9.1.9 Changing the Current Working Directory 2
-
- When the current working directory (see 2.2.2.159) is to be changed, 2
- unless the utility or function description states otherwise, the 2
- operation shall succeed unless a call to the POSIX.1 {8} _c_h_d_i_r() function 2
- would fail when invoked with the new working directory pathname as its 2
- argument. 2
-
- 2.9.1.9.1 Changing the Current Working Directory Rationale. (_T_h_i_s 2
- _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2) 2
-
- This subclause covers the access permissions and pathname structures 2
- involved with changing directories, such as with cd or (the UPE-extended) 2
- mailx utilities. 2
-
- 2.9.1.10 Establish the Locale
-
- The functionality of the POSIX.1 {8} _s_e_t_l_o_c_a_l_e() function is assumed to
- be available on all POSIX.2 conformant systems; i.e., utilities that
- require the capability of establishing an international operating
- environment shall be permitted to set the specified category of the
- international environment.
-
- BEGIN_RATIONALE
-
- 2.9.1.10.1 Establish the Locale Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t
- _o_f _P_1_0_0_3._2)
-
- The entire concept of locale categories such as the LC_* variables along
- with any implementation-defined categories is assumed to be handled as in
- POSIX.1 {8}.
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 168 2 Terminology and General Requirements
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- END_RATIONALE
-
-
- 2.9.1.11 Actions Equivalent to POSIX.1 Functions
-
- Some utility descriptions specify that a utility performs actions
- equivalent to a POSIX.1 {8} function. Such specifications require only
- that the external effects be equivalent, not that any effect within the
- utility and visible only to the utility be equivalent.
-
- BEGIN_RATIONALE
-
- 2.9.1.11.1 Actions Equivalent to POSIX.1 Functions Rationale. (_T_h_i_s
- _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2)
-
- An objection was received to an earlier draft that said this approach of
- equivalent functions was unreasonable, as the reader (and the person
- writing a test suite) would be responsible for interpreting which
- portions of POSIX.1 {8} were included and which were not. For example,
- would such intermediate effects as the setting of _e_r_r_n_o be required if
- the related POSIX.1 {8} function called for that? The answer is no:
- this standard is only concerned with the end results of functions against
- the file system and the environment, and not any intermediate values or
- results visible only to the programmer using the POSIX.1 {8} function in
- a C (or other high-level language) program.
-
- END_RATIONALE
-
-
- 2.9.2 Concepts Derived from the C Standard
-
- Some of the standard utilities perform complex data manipulation using
- their own procedure and arithmetic languages, as defined in their
- Extended Description or Operands subclauses. Unless otherwise noted, the
- arithmetic and semantic concepts (precision, type conversion, control
- flow, etc.) are equivalent to those defined in the C Standard {7}, as
- described in the following subclauses. Note that there is no requirement
- that the standard utilities be implemented in any particular programming
- language.
-
- BEGIN_RATIONALE
-
- 2.9.2.0.1 Concepts Derived from the C Standard Rationale. (_T_h_i_s
- _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2)
-
- This subclause was introduced to answer complaints that there was
- insufficient detail presented by such utilities as awk or sh about their
- procedural control statements and their methods of performing arithmetic
- functions. Earlier drafts, derived heavily from the original manual
- pages, contained statements such as ``for loops similar to the
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 2.9 Dependencies on Other Standards 169
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- C Standard {7},'' which was good enough for a general understanding, but
- insufficient for a real implementation.
-
- The C Standard {7} was selected as a model because most historical
- implementations of the standard utilities were written in C. Thus, it is
- more likely that they will act in a manner desired by POSIX.2 without
- modification.
-
- Using the C Standard {7} is primarily a notational convenience, so the
- many ``little languages'' in POSIX.2 would not have to be rigorously
- described in every aspect. Its selection does not require that the
- standard utilities be written in Standard C; they could be written in
- common-usage C, Ada, Pascal, assembler language, or anything else.
-
- The sizes of the various numeric values refer to C-language datatypes 1
- that are allowed to be different sizes by the C Standard {7}. Thus, like 1
- a C-language application, a shell application cannot rely on their exact 1
- size. However, it can rely on their minimum sizes expressed in the 1
- C Standard {7}, such as {LONG_MAX} for a _l_o_n_g type. 1
-
- END_RATIONALE 1
-
-
- 2.9.2.1 Arithmetic Precision and Operations
-
- Integer variables and constants, including the values of operands and
- option-arguments, used by the standard utilities shall be implemented as
- equivalent to the C Standard {7} _s_i_g_n_e_d _l_o_n_g data type; floating point
- shall be implemented as equivalent to the C Standard {7} _d_o_u_b_l_e type.
- Conversions between types shall be as described in the C Standard {7}.
- All variables shall be initialized to zero if they are not otherwise
- assigned by the application's input.
-
- Arithmetic operators and functions shall be implemented as equivalent to
- those in the cited C Standard {7} section, as listed in Table 2-14.
-
- The evaluation of arithmetic expressions shall be equivalent to that
- described in the C Standard {7} section 3.3 Expressions.
-
- 2.9.2.2 Mathematic Functions
-
- Any mathematic functions with the same names as those in the C Standard
- {7}'s sections:
-
- 4.5 _M_a_t_h_e_m_a_t_i_c_s <math.h>
-
- 4.10.2 _P_s_e_u_d_o-_r_a_n_d_o_m _s_e_q_u_e_n_c_e _g_e_n_e_r_a_t_i_o_n _f_u_n_c_t_i_o_n_s
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 170 2 Terminology and General Requirements
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
-
- Table 2-14 - C Standard Operators and Functions
-
- _________________________________________________________________________
- ___________O_p_e_r_a_t_i_o_n________________C__S_t_a_n_d_a_r_d__{_7_}__E_q_u_i_v_a_l_e_n_t__R_e_f_e_r_e_n_c_e____
- ( ) _3._3._1 _P_r_i_m_a_r_y _E_x_p_r_e_s_s_i_o_n_s
- _________________________________________________________________________
- postfix ++ _3._3._2 _P_o_s_t_f_i_x _O_p_e_r_a_t_o_r_s
- __p_o_s_t_f_i_x__-_-______________________________________________________________
- unary +
- unary -
- prefix ++
- prefix -- _3._3._3 _U_n_a_r_y _O_p_e_r_a_t_o_r_s
-
- ~!
- sizeof()
- _________________________________________________________________________
- *
- / _3._3._5 _M_u_l_t_i_p_l_i_c_a_t_i_v_e _O_p_e_r_a_t_o_r_s
- __%_______________________________________________________________________
- | + | |
- | - | _3._3._6 _A_d_d_i_t_i_v_e _O_p_e_r_a_t_o_r_s |
- _|________________________________|________________________________________|
- | << | _3._3._7 _B_i_t_w_i_s_e _S_h_i_f_t _O_p_e_r_a_t_o_r_s |
- _|_>_>______________________________|________________________________________|
- | <, <= | |
- | >, >= | _3._3._8 _R_e_l_a_t_i_o_n_a_l _O_p_e_r_a_t_o_r_s |
- _|________________________________|________________________________________|
- | == | _3._3._9 _E_q_u_a_l_i_t_y _O_p_e_r_a_t_o_r_s |
- _|_!_=______________________________|________________________________________|
- | & | _3._3._1_0 _B_i_t_w_i_s_e _A_N_D _O_p_e_r_a_t_o_r |
- _|________________________________|________________________________________|
- _|_^_______________________________|____3.___3.___1__1___B__i__t__w__i__s__e___E__x__c__l__u__s__i__v__e___O__R___O__p__e__r__a__t__o__r__|
- | | | _3._3._1_2 _B_i_t_w_i_s_e _I_n_c_l_u_s_i_v_e _O_R _O_p_e_r_a_t_o_r |
- _|________________________________|________________________________________|
- _|_&_&______________________________|____3.___3.___1__3___L__o__g__i__c__a__l___A__N__D___O__p__e__r__a__t__o__r___________|
- | || | _3._3._1_4 _L_o_g_i_c_a_l _O_R _O_p_e_r_a_t_o_r |
- _|________________________________|________________________________________|
- _|___e__x__p__r?___e__x__p__r:___e__x__p__r_________________|____3.___3.___1__5___C__o__n__d__i__t__i__o__n__a__l___O__p__e__r__a__t__o__r___________|
- | =, *=, /=, %=, +=, -= | |
- | <<=, >>=, &=, ^=, |= | _3._3._1_6 _A_s_s_i_g_n_m_e_n_t _O_p_e_r_a_t_o_r_s |
- _|________________________________|________________________________________|
- | if ( ) | |
- | _i_f ( ) ... else | _3._6._4 _S_e_l_e_c_t_i_o_n _S_t_a_t_e_m_e_n_t_s |
- _|___s__w__i__t__c__h_(__)______________________|________________________________________|
- | _w_h_i_l_e ( ) | |
- | _d_o ... _w_h_i_l_e ( ) | _3._6._5 _I_t_e_r_a_t_i_o_n _S_t_a_t_e_m_e_n_t_s |
- | _f_o_r ( ) | |
- _|________________________________|________________________________________|
- | _g_o_t_o | |
- | | |
- | Copyright c 1991 IE|EE. All rights reserved. |
- | This is an unapproved IEEE S|tandards Draft, subject to change. |
- | | |
- | | |
- | | |
- | | |
- | | |
- 2|.9 Dependencies on Other Standar|ds 171|
- | | |
- | | |
- | | |
- | | |
- | | |
- P|1003.2/D11.2 | INFORMATION TECHNOLOGY--POSIX|
- | | |
- | _c_o_n_t_i_n_u_e | |
- | _b_r_e_a_k | _3._6._6 _J_u_m_p _S_t_a_t_e_m_e_n_t_s |
- | _r_e_t_u_r_n | |
- _|________________________________|________________________________________|
-
-
- shall be implemented to return the results equivalent to those returned
- from a call to the corresponding C function described in the
- C Standard {7}.
-
-
- 2.10 Utility Conventions
-
-
- 2.10.1 Utility Argument Syntax
-
- This subclause describes the argument syntax of the standard utilities
- and introduces terminology used throughout the standard for describing
- the arguments processed by the utilities.
-
- Within the standard, a special notation is used for describing the syntax
- of a utility's arguments. Unless otherwise noted, all utility
- descriptions use this notation, which is illustrated by this example (see
- 3.9.1):
-
-
- utility_name [-a] [-b] [-c _o_p_t_i_o_n__a_r_g_u_m_e_n_t] [-d | -e]
- [-f_o_p_t_i_o_n__a_r_g_u_m_e_n_t] [_o_p_e_r_a_n_d ...]
-
- The notation used for the Synopsis subclauses imposes requirements on the
- implementors of the standard utilities and provides a simple reference
- for the reader of the standard.
-
- (1) The utility in the example is named utility_name. It is
- followed by _o_p_t_i_o_n_s, _o_p_t_i_o_n-_a_r_g_u_m_e_n_t_s, and _o_p_e_r_a_n_d_s. The
- arguments that consist of hyphens and single letters or digits,
- such as -a, are known as _o_p_t_i_o_n_s (or, historically, _f_l_a_g_s).
- Certain options are followed by an _o_p_t_i_o_n-_a_r_g_u_m_e_n_t, as shown
- with [-c _o_p_t_i_o_n__a_r_g_u_m_e_n_t]. The arguments following the last
- options and option-arguments are named _o_p_e_r_a_n_d_s.
-
- (2) Option-arguments are sometimes shown separated from their
- options by <blanks>, sometimes directly adjacent. This reflects
- the situation that in some cases an option-argument is included
- within the same argument string as the option; in most cases it
- is the next argument. The Utility Syntax Guidelines in 2.10.2
- require that the option be a separate argument from its option-
- argument, but there are some exceptions in this standard to
- ensure continued operation of historical applications:
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 172 2 Terminology and General Requirements
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- (a) If the Synopsis of a standard utility shows a <space>
- between an option and option-argument (as with
- [-c _o_p_t_i_o_n__a_r_g_u_m_e_n_t] in the example), a conforming
- application shall use separate arguments for that option
- and its option-argument.
-
- (b) If a <space> is not shown (as with [-f_o_p_t_i_o_n__a_r_g_u_m_e_n_t] in
- the example), a conforming application shall place an
- option and its option-argument directly adjacent in the
- same argument string, without intervening <blank>s.
-
- (c) Notwithstanding the requirements on conforming
- applications, a conforming implementation shall permit,
- but shall not require, an application to specify options
- and option-arguments as separate arguments whether or not
- a <space> is shown on the synopsis line.
-
- (d) A standard utility may also be implemented to operate
- correctly when the required separation into multiple
- arguments is violated by a nonconforming application.
-
- (3) Options are usually listed in alphabetical order unless this
- would make the utility description more confusing. There are no
- implied relationships between the options based upon the order
- in which they appear, unless otherwise stated in the Options
- subclause, or unless the exception in 2.10.2 guideline 11
- applies. If an option that does not have option-arguments is
- repeated, the results are undefined, unless otherwise stated.
-
- (4) Frequently, names of parameters that require substitution by
- actual values are shown with embedded underscores.
- Alternatively, parameters are shown as follows:
-
- <_p_a_r_a_m_e_t_e_r _n_a_m_e>
-
- The angle brackets are used for the symbolic grouping of a
- phrase representing a single parameter and shall never be
- included in data submitted to the utility.
-
- (5) When a utility has only a few permissible options, they are
- sometimes shown individually, as in the example. Utilities with
- many flags generally show all of the individual flags (that do
- not take option-arguments) grouped, as in:
-
-
- utility_name [-abcDxyz] [-p _a_r_g] [_o_p_e_r_a_n_d]
-
- Utilities with very complex arguments may be shown as follows:
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 2.10 Utility Conventions 173
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- utility_name [_o_p_t_i_o_n_s] [_o_p_e_r_a_n_d_s]
-
- (6) Unless otherwise specified, whenever an operand or option-
- argument is or contains a numeric value:
-
- - the number shall be interpreted as a decimal integer.
-
- - numerals in the range 0 to 2147483647 shall be syntactically
- recognized as numeric values.
-
- - When the utility description states that it accepts negative
- numbers as operands or option-arguments, numerals in the
- range -2147483647 to 2147483647 shall be syntactically
- recognized as numeric values.
-
- This does not mean that all numbers within the allowable range
- are necessarily semantically correct. A standard utility that
- accepts an option-argument or operand that is to be interpreted
- as a number, and for which a range of values smaller than that
- shown above is permitted by this standard, describes that
- smaller range along with the description of the option-argument
- or operand. If an error is generated, the utility's diagnostic
- message shall indicate that the value is out of the supported
- range, not that it is syntactically incorrect.
-
- (7) Arguments or option-arguments enclosed in the [ and ] notation
- are optional and can be omitted. The [ and ] symbols shall
- never be included in data submitted to the utility.
-
- (8) Arguments separated by the | vertical bar notation are mutually
- exclusive. The | symbols shall never be included in data
- submitted to the utility. Alternatively, mutually exclusive
- options and operands may be listed with multiple Synopsis lines.
- For example:
-
-
- utility_name -d [-a] [-c _o_p_t_i_o_n__a_r_g_u_m_e_n_t] [_o_p_e_r_a_n_d ...]
- utility_name -e [-b] [_o_p_e_r_a_n_d ...]
-
- When multiple synopsis lines are given for a utility, that is an
- indication that the utility has mutually exclusive arguments.
- These mutually exclusive arguments alter the functionality of
- the utility so that only certain other arguments are valid in
- combination with one of the mutually exclusive arguments. Only
- one of the mutually exclusive arguments is allowed for
- invocation of the utility. Unless otherwise stated in an
- accompanying Options subclause, the relationships between
- arguments depicted in the Synopsis subclauses are mandatory
- requirements placed on conforming applications. The use of
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 174 2 Terminology and General Requirements
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- conflicting mutually exclusive arguments produces undefined
- results, unless a utility description specifies otherwise. When
- an option is shown without the [ ] brackets, it means that
- option is required for that version of the Synopsis. However,
- it is not required to be the first argument, as shown in the
- example above, unless otherwise stated.
-
- (9) Ellipses (...) are used to denote that one or more occurrences
- of an option or operand are allowed. When an option or an
- operand followed by ellipses is enclosed in brackets, zero or
- more options or operands can be specified. The forms
-
-
- utility_name -f _o_p_t_i_o_n__a_r_g_u_m_e_n_t ... [_o_p_e_r_a_n_d ...] 1
- utility_name [-g _o_p_t_i_o_n__a_r_g_u_m_e_n_t] ... [_o_p_e_r_a_n_d ...]
-
- indicate that multiple occurrences of the option and its
- option-argument preceding the ellipses are valid, with semantics
- as indicated in the Options subclause of the utility. (See also
- Guideline 11 in 2.10.2.) In the first example, each option- 1
- argument requires a preceding -f and at least one 1
- -f _o_p_t_i_o_n__a_r_g_u_m_e_n_t must be given. 1
-
- (10) When the synopsis line is too long to be printed on a single
- line in this document, the indented lines following the initial
- line are continuation lines. An actual use of the command would
- appear on a single logical line.
-
- BEGIN_RATIONALE
-
-
- 2.10.1.1 Utility Argument Syntax Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a
- _p_a_r_t _o_f _P_1_0_0_3._2)
-
- This is the subclause where the definitions of _o_p_t_i_o_n, _o_p_t_i_o_n-_a_r_g_u_m_e_n_t,
- and _o_p_e_r_a_n_d come together.
-
- The working group felt that recent trends toward diluting the Synopsis
- subclauses of historical manual pages to something like:
-
- command [_o_p_t_i_o_n_s] [_o_p_e_r_a_n_d_s]
-
- were a disservice to the reader. Therefore, considerable effort was
- placed into rigorous definitions of all the command line arguments and
- their interrelationships. The relationships depicted in the Synopses are
- normative parts of this standard; this information is sometimes repeated
- in textual form, but that is only for clarity within context.
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 2.10 Utility Conventions 175
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- The use of ``undefined'' for conflicting argument usage and for repeated
- usage of the same option is meant to prevent portable applications from
- using conflicting arguments or repeated options, unless specifically
- allowed, as is the case with ls (which allows simultaneous, repeated use
- of the -C, -l, and -1 options). Many historical implementations will
- tolerate this usage, choosing either the first or the last applicable
- argument, and this tolerance can continue, but portable applications
- cannot rely upon it. (Other implementations may choose to print usage
- messages instead.)
-
- The use of ``undefined'' for conflicting argument usage also allows an
- implementation to make reasonable extensions to utilities where the
- implementor considers mutually exclusive options according to POSIX.2 to
- have a sensible meaning and result.
-
- POSIX.2 does not define the result of a utility when an option-argument
- or operand is not followed by ellipses and the application specifies more
- than one of that option-argument or operand. This allows an
- implementation to define valid (although nonstandard) behavior for the
- utility when more than one such option or operand are specified.
-
- Allowing <blank>s after an option (i.e., placing an option and its
- option-argument into separate argument strings) when the standard does
- not require it encourages portability of users, while still preserving
- backward compatibility of scripts. Inserting <blank>s between the option
- and the option-argument is preferred; however, historical usage has not
- been consistent in this area; therefore, <blank>s are required to be
- handled by all implementations, but implementations are also allowed to
- handle the historical syntax. Another justification for selecting the
- multiple-argument method was that the single-argument case is inherently
- ambiguous when the option-argument can legitimately be a null string.
-
- Wording was also added to explicitly state that digits are permitted as
- operands and option-arguments. The lower and upper bounds for the values
- of the numbers used for operands and option-arguments were derived from
- the C Standard {7} values for {LONG_MIN} and {LONG_MAX}. The requirement
- on the standard utilities is that numbers in the specified range do not
- cause a syntax error although the specification of a number need not be
- semantically correct for a particular operand or option-argument of a
- utility. For example, the specification of dd obs=3000000000 would yield
- undefined behavior for the application and would be a syntax error
- because the number 3000000000 is outside of the range -2147483647 to
- +2147483647. On the other hand, dd obs=2000000000 may cause some error,
- such as ``blocksize too large,'' rather than a syntax error.
-
- END_RATIONALE
-
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 176 2 Terminology and General Requirements
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- 2.10.2 Utility Syntax Guidelines
-
- The following guidelines are established for the naming of utilities and
- for the specification of options, option-arguments, and operands. Clause
- 7.5 describes a function that assists utilities in handling options and
- operands that conform to these guidelines.
-
- Operands and option-arguments can contain characters not specified in
- 2.4.
-
- The guidelines are intended to provide guidance to the authors of future
- utilities. Some of the standard utilities do not conform to all of these
- guidelines; in those cases, the Options subclauses describe the
- deviations.
-
- Guideline 1: Utility names should be between two and nine
- characters, inclusive.
-
- Guideline 2: Utility names should include lowercase letters (the
- lower character classification) from the set
- described in 2.4 and digits only.
-
- Guideline 3: Each option name should be a single alphanumeric
- character (the alnum character classification) from
- the set described in 2.4. The -W (capital-W) option
- shall be reserved for vendor extensions.
-
- NOTE: The other alphanumeric characters are subject
- to standardization in the future, based on historical
- usage. Implementors should be aware that future
- POSIX working groups may offer little sympathy to
- vendors with isolated extensions in conflict with
- future drafts.
-
- Guideline 4: All options should be preceded by the '-' delimiter
- character.
-
- Guideline 5: Options without option-arguments should be accepted
- when grouped behind one '-' delimiter.
-
- Guideline 6: Each option and option-argument should be a separate
- argument, except as noted in 2.10.1, item (2).
-
- Guideline 7: Option-arguments should not be optional.
-
- Guideline 8: When multiple option-arguments are specified to
- follow a single option, they should be presented as a
- single argument, using commas within that argument or 2
- <blank>s within that argument to separate them.
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 2.10 Utility Conventions 177
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- Guideline 9: All options should precede operands on the command
- line.
-
- Guideline 10: The argument "--" should be accepted as a delimiter
- indicating the end of options. Any following
- arguments should be treated as operands, even if they
- begin with the '-' character. The "--" argument
- should not be used as an option or as an operand.
-
- Guideline 11: The order of different options relative to one
- another should not matter, unless the options are
- documented as mutually exclusive and such an option
- is documented to override any incompatible options
- preceding it. If an option that has option-arguments
- is repeated, the option and option-argument
- combinations should be interpreted in the order
- specified on the command line.
-
- Guideline 12: The order of operands may matter and position-related
- interpretations should be determined on a utility-
- specific basis.
-
- Guideline 13: For utilities that use operands to represent files to
- be opened for either reading or writing, the "-"
- operand should be used only to mean standard input
- (or standard output when it is clear from context
- that an output file is being specified).
-
- Any utility claiming conformance to these guidelines shall conform
- completely to these guidelines, as if these guidelines contained the term
- ``shall'' instead of ``should,'' except that the utility is permitted to
- accept usage in violation of these guidelines for backward compatibility
- as long as the required form is also accepted.
-
- Guidelines 1 and 2 are offered as guidance for locales using Latin
- alphabets. No recommendations are made by this standard concerning
- utility naming in other locales.
-
- BEGIN_RATIONALE
-
-
- 2.10.2.1 Utility Syntax Guidelines Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a
- _p_a_r_t _o_f _P_1_0_0_3._2)
-
- This subclause is based on the rules listed in the _S_V_I_D. It was included
- for two reasons:
-
- (1) The individual utility descriptions in Sections 4, 5, and 6, and
- Annexes A and C needed a set of common (although not universal)
- actions on which they could anchor their descriptions of option
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 178 2 Terminology and General Requirements
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- and operand syntax. Most of the standard utilities actually do
- use these guidelines, and many of their historical
- implementations use the _g_e_t_o_p_t() function for their parsing.
- Therefore, it was simpler to cite the rules and merely identify
- exceptions.
-
- (2) Writers of portable applications need suggested guidelines if
- the POSIX community is to avoid the chaos of historical UNIX
- system command syntax.
-
- It is recommended that all _f_u_t_u_r_e utilities and applications use these
- guidelines to enhance ``user portability.'' The fact that some
- historical utilities could not be changed (to avoid breaking existing
- applications) should not deter this future goal.
-
- The voluntary nature of the guidelines is highlighted by repeated uses of
- the word _s_h_o_u_l_d throughout. This usage should not be misinterpreted to
- imply that utilities that claim conformance in their Options subclauses
- do not always conform.
-
- Guideline 2 recommends the naming of utilities. In 3.9.1, it is further
- stated that a command used in the shell command language cannot be named
- with a trailing colon.
-
- Guideline 3 was changed to allow alphanumeric characters (letters and
- digits) from the character set to allow compatibility with historical
- usage. Historical practice allows the use of digits wherever practical;
- and there are no portability issues that would prohibit the use of
- digits. In fact, from an internationalization viewpoint, digits (being
- nonlanguage dependent) are preferable over letters (a ``-2'' is
- intuitively self-explanatory to any user, while in the ``-f _f_i_l_e_n_a_m_e''
- the letter f is a mnemonic aid only to speakers of Latin based languages
- where ``filename'' happens to translate to a word that begins with f.
- Since guideline 3 still retains the word ``single,'' multidigit options
- are not allowed. Instances of historical utilities that used them have
- been marked obsolescent in this standard, with the numbers being changed
- from option names to option-arguments.
-
- It is difficult to come up with a satisfactory solution to the problem of
- namespace in option characters. When the POSIX.2 group desired to extend
- the historical cc utility to accept C Standard {7} programs, it found
- that all of the portable alphabet was already in use by various vendors.
- Thus, it had to devise a new name, c89, rather than something like cc -X.
- There were suggestions that implementors be restricted to providing
- extensions through various means (such as using a plus-sign as the option
- delimiter or using option characters outside the alphanumeric set) that
- would reserve all of the remaining alphanumeric characters for future
- POSIX standards. These approaches were resisted because they lacked the
- historical style of UNIX. Furthermore, if a vendor-provided option
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 2.10 Utility Conventions 179
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- should become commonly used in the industry, it would be a candidate for
- standardization. It would be desireable to standardize such a feature
- using existing practice for the syntax (the semantics can be standardized
- with any syntax). This would not be possible if the syntax was one
- reserved for the vendor. However, since the standardization process may
- lead to minor changes in the semantics, it may prove to be better for a
- vendor to use a syntax that will not be affected by standardization. As
- a compromise, the following statements are made by the developers of
- POSIX.2:
-
- - In future revisions to this standard, and in other POSIX standards,
- every attempt will be made to develop new utilities and features
- that conform to the Utility Syntax Guidelines.
-
- - Future extensions and additions to POSIX standards will not use the
- -W (capital W) option. This option is forever reserved to
- implementors for extensions, in a manner reminiscent of the
- option's use in historical versions of the cc utility. The other
- alphanumeric characters are subject to standardization in the
- future, based on historical usage.
-
- Implementors should be cognizant of these intentions and aware that
- future POSIX working groups will offer little sympathy to vendors with
- extensions in conflict with future drafts. In the first version of
- POSIX.2, vendors held a virtual veto power when conflicts arose with
- their extensions; in the future, POSIX working groups may be less
- concerned about preserving isolated extensions that conflict with these
- statements of intent.
-
- Guideline 8 includes the concept of comma-separated lists in a single
- argument. It is up to the utility to parse such a list itself because
- _g_e_t_o_p_t() just returns the single string. This situation was retained so
- that certain historical utilities wouldn't violate the guidelines.
- Applications preparing for international use should be aware of an
- occasional problem with comma-separated lists: in some locales, the
- comma is used as the radix character. Thus, if an application is
- preparing operands for a utility that expects a comma-separated lists, it
- should avoid generating noninteger values through one of the means that
- is influenced by setting the LC_NUMERIC variable [such as awk, bc,
- printf, or _p_r_i_n_t_f()].
-
- Applications calling any utility with a first operand starting with "-"
- should usually specify "--", as indicated by Guideline 10, to mark the
- end of the options. This is true even if the Synopsis in this standard
- does not specify any options; implementations may provide options as
- extensions to this standard. The standard utilities that do not support
- Guideline 10 indicate that fact in the Options subclause of the utility
- description.
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 180 2 Terminology and General Requirements
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- Guideline 11 was modified to clarify that the order of different options
- should not matter relative to one another. However, the order of
- repeated options that also have option-arguments may be significant;
- therefore, such options are required to be interpreted in the order that
- they are specified. The make utility is an instance of a historical
- utility that uses repeated options in which the order is significant.
- Multiple files are specified by giving multiple instances of the -f
- option, for example:
-
- make -f common_header -f specific_rules target
-
- Guideline 13 does not imply that all of the standard utilities
- automatically accept the operand "-" to mean standard input or output,
- nor does it specify the actions of the utility upon encountering multiple
- "-" operands. It simply says that, by default, "-" operands shall not be
- used for other purposes in the file reading/writing [but not _s_t_a_t()ing,
- _u_n_l_i_n_k()ing, touch_i_n_g, etc.] utilities. All information concerning
- actual treatment of the "-" operand is found in the individual utility
- clauses.
-
- An area of concern that was expressed during the balloting process was
- that as implementations mature implementation-defined utilities and
- implementation-defined utility options will result. The notion was
- expressed that there needed to be a standard way, say an environment
- variable or some such mechanism, to identify implementation-defined
- utilities separately from standard utilities that may have the same name.
- It was decided that there already exist several ways of dealing with this
- situation and that it is outside of the scope of the standard to attempt
- to standardize in the area of nonstandard items. A method that exists on
- some historical implementations is the use of the so-called /local/bin or
- /usr/local/bin directory to separate local or additional copies or
- versions of utilities. Another method that is also used is to isolate
- utilities into completely separate domains. Still another method to
- ensure that the desired utility is being used is to request the utility
- by its full pathname. There are, to be sure, many approaches to this
- situation; the examples given above serve to illustrate that there is
- more than one.
-
- END_RATIONALE
-
-
-
-
-
-
-
-
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 2.10 Utility Conventions 181
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- 2.11 Utility Description Defaults
-
- This clause describes all of the subclauses used within the utility
- clauses in Section 4 and the other sections that describe standard
- utilities. It describes:
-
- (1) Intended usage of the subclause.
-
- (2) Global defaults that affect all the standard utilities.
-
- BEGIN_RATIONALE
-
-
- 2.11.0.1 Utility Description Defaults Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t
- _a _p_a_r_t _o_f _P_1_0_0_3._2)
-
- This clause is arranged with headings in the same order as all the
- utility descriptions. It is a collection of related and unrelated
- information concerning:
-
- (1) The default actions of utilities.
-
- (2) The meanings of notations used in the standard that are specific
- to individual utility subclauses.
-
- Although this material may seem out of place in Section 2, it is
- important that this information appear before any of the utilities to be
- described later. Unfortunately, since the utilities are split into
- multiple major sections (chapters), this information could not be placed
- into any one of those sections without confusing cross references.
-
- END_RATIONALE
-
-
- 2.11.1 Synopsis
-
- The Synopsis subclause summarizes the syntax of the calling sequence for
- the utility, including options, option-arguments, and operands.
- Standards for utility naming are described in 2.10.2; for describing the
- utility's arguments in 2.10.1.
-
-
- 2.11.2 Description
-
- The Description subclause describes the actions of the utility. If the
- utility has a very complex set of subcommands or its own procedural
- language, an Extended Description subclause is also provided. Most
- explanations of optional functionality are omitted here, as they are
- usually explained in the Options subclause.
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 182 2 Terminology and General Requirements
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- Some utilities in this standard are described in terms of equivalent
- POSIX.1 {8} functionality. As explained in 1.1, a fully conforming
- POSIX.1 {8} base is not a prerequisite for this standard. When specific
- functions are cited, the underlying operating system shall provide
- equivalent functionality and all side effects associated with successful
- execution of the function. The treatment of errors and intermediate
- results from the individual functions cited are generally not specified
- by this standard. See the utility's Exit Status and Consequences of
- Errors subclauses for all actions associated with errors encountered by
- the utility.
-
-
- 2.11.3 Options
-
- The Options subclause describes the utility options and option-arguments,
- and how they modify the actions of the utility. Standard utilities that
- have options either fully comply with the 2.10.2 or describe all
- deviations. Apparent disagreements between functionality descriptions in
- the Options and Description (or Extended Description) subclauses are
- always resolved in favor of the Options subclause.
-
- Each Options subclause that uses the phrase ``The ... utility shall
- conform to the utility argument syntax guidelines ...'' refers only to
- the use of the utility as specified by this standard; implementation
- extensions should also conform to the guidelines, but may allow
- exceptions for historical practice.
-
- Unless otherwise stated in the utility description, when given an option
- unrecognized by the implementation, or when a required option-argument is
- not provided, standard utilities shall issue a diagnostic message to
- standard error and exit with a nonzero exit status.
-
- Default Behavior: When this subclause is listed as ``None,'' it means
- that the implementation need not support any options. Standard utilities
- that do not accept options, but that do accept operands, shall recognize
- "--" as a first argument to be discarded.
-
- BEGIN_RATIONALE
-
-
- 2.11.3.1 Options Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2)
-
- Although it has not always been possible, the working group has tried to
- avoid repeating information and therefore reduced the risk that the
- duplicate explanations are somehow modified to be out of sync.
-
- The requirement for recognizing -- is because portable applications need
- a way to shield their operands from any arbitrary options that the
- implementation may provide as an extension. For example, if the standard
- utility foo is listed as taking no options, and the application needed to
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 2.11 Utility Description Defaults 183
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- give it a pathname with a leading hyphen, it could safely do it as:
-
- foo -- -myfile
-
- and avoid any problems with -m used as an extension.
-
- END_RATIONALE
-
-
- 2.11.4 Operands
-
- The Operands subclause describes the utility operands, and how they
- affect the actions of the utility. Apparent disagreements between
- functionality descriptions in the Operands and Description (or Extended
- Description) subclauses are always resolved in favor of the Operands
- subclause.
-
- If an operand naming a file can be specified as -, which means to use the
- standard input instead of a named file, this shall be explicitly stated
- in this subclause. Unless otherwise stated, the use of multiple
- instances of - to mean standard input in a single command produces
- unspecified results.
-
- Unless otherwise stated, the standard utilities that accept operands
- shall process those operands in the order specified in the command line.
-
- Default Behavior: When this subclause is listed as ``None,'' it means
- that the implementation need not support any operands.
-
- BEGIN_RATIONALE
-
-
- 2.11.4.1 Operands Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2)
-
- This usage of - is never shown in the Synopsis. Similarly, this usage of
- -- is never shown.
-
- The requirement for processing operands in command line order is to avoid
- a ``WeirdNIX'' utility that might choose to sort the input files
- alphabetically, by size, or by directory order. Although this might be
- acceptable for some utilities, in general the programmer has a right to
- know exactly what order will be chosen.
-
- Some of the standard utilities take multiple _f_i_l_e operands and act as if
- they were processing the concatenation of those files. For example,
-
- asa file1 file2 and cat file1 file2 | asa
-
- have similar results when questions of file access, errors, and
- performance are ignored. Other utilities, such as grep or wc, have
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 184 2 Terminology and General Requirements
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- completely different results in these two cases. This latter type of
- utility is always identified in its Description or Operands subclauses,
- whereas the former is not. Although it might be possible to create a
- general assertion about the former case, the following points must be
- addressed:
-
- - Access times for the files might be different in the operand case
- versus the cat case.
-
- - The utility may have error messages that are cognizant of the input
- file name and this added value should not be suppressed. (As an
- example, awk sets a variable with the file name at each file
- boundary.)
-
- END_RATIONALE
-
-
- 2.11.5 External Influences
-
- The External Influences subclause describes all input data that is
- specified by the invoker, data received from the environment, and other
- files or databases that may be used by the utility. There are four
- subclauses that contain all the substantive information about external
- influences; because of this, this level of header is always left blank.
-
- Certain of the standard utilities describe how they can invoke other
- utilities or applications, such as by passing a command string to the
- command interpreter. The external requirements of such invoked utilities
- are not described in the subclause concerning the standard utility that
- invokes them.
-
-
- 2.11.5.1 Standard Input
-
- The Standard Input subclause describes the standard input of the utility.
- This subclause is frequently merely a reference to the following
- subclause, because many utilities treat standard input and input files in
- the same manner. Unless otherwise stated, all restrictions described in
- Input Files apply to this subclause as well.
-
- Use of a terminal for standard input may cause any of the standard
- utilities that read standard input to stop when used in the background.
- For this reason, applications should not use interactive features in
- scripts to be placed in the background.
-
- The specified standard input format of the standard utilities shall not
- depend on the existence or value of the environment variables defined in
- this standard, except as provided by this standard.
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 2.11 Utility Description Defaults 185
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- Default Behavior: When this subclause is listed as ``None,'' it means
- that the standard input shall not be read when the utility is used as
- described by this standard.
-
- BEGIN_RATIONALE
-
- 2.11.5.1.1 Standard Input Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f
- _P_1_0_0_3._2)
-
- This subclause was globally renamed from Standard Input Format in
- previous drafts to better reflect its role in describing the existence
- and usage of the file, in addition to its format.
-
- END_RATIONALE
-
-
- 2.11.5.2 Input Files
-
- The Input Files subclause describes the files, other than the standard
- input, used as input by the utility. It includes files named as operands
- and option-arguments as well as other files that are referred to, such as
- startup/initialization files, databases, etc. Commonly-used files are
- generally described in one place and cross-referenced by other utilities.
-
- Some of the standard utilities, such as filters, process input files a
- line or a block at a time and have no restrictions on the maximum input
- file size. Some utilities may have size limitations that are not as
- obvious as file space or memory limitations. Such limitations should
- reflect resource limitations of some sort, not arbitrary limits set by
- implementors. Implementations shall define in the conformance
- documentation those utilities that are limited by constraints other than
- file system space, available memory, and other limits specifically cited
- by this standard, and identify what the constraint is, and indicate a way
- of estimating when the constraint would be reached. Similarly, some
- utilities descend the directory tree (recursively). Implementations
- shall also document any limits that they may have in descending the
- directory tree that are beyond limits cited by this standard.
-
- When a standard utility reads a seekable input file and terminates 1
- without an error before it reaches end-of-file, the utility shall ensure 1
- that the file offset in the open file description is properly positioned 1
- just past the last byte processed by the utility. For files that are not 1
- seekable, the state of the file offset in the open file description for 1
- that file is unspecified. 1
-
- When an input file is described as a _t_e_x_t _f_i_l_e, the utility produces
- undefined results if given input that is not from a text file, unless
- otherwise stated. Some utilities (e.g., make, read, sh, etc.) allow for
- continued input lines using an escaped <newline> convention; unless
- otherwise stated, the utility need not be able to accumulate more than
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 186 2 Terminology and General Requirements
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- {LINE_MAX} bytes from a set of multiple, continued input lines. If a
- utility using the escaped <newline> convention detects an end-of-file
- condition immediately after an escaped <newline>, the results are
- unspecified.
-
- Record formats are described in a notation similar to that used by the C
- language function, _p_r_i_n_t_f(). See 2.12 for a description of this
- notation.
-
- Default Behavior: When this subclause is listed as ``None,'' it means
- that no input files are required to be supplied when the utility is used
- as described by this standard.
-
- BEGIN_RATIONALE
-
- 2.11.5.2.1 Input Files Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f
- _P_1_0_0_3._2)
-
- This subclause was globally renamed from Input File Formats in previous
- drafts to better reflect its role in describing the existence and usage
- of the files, in addition to their format.
-
- The description of file offsets answers the question: Are the following 1
- three commands equivalent? 1
-
- tail -n +2 file 1
- (sed -n 1q; cat) < file 1
- cat file | (sed -n 1q; cat) 1
-
- The answer is that a conforming application cannot assume they are 1
- equivalent. The second command is equivalent to the first only when the 1
- file is seekable. In the third command, if the file offset in the open 1
- file description were not unspecified, sed would have to be implemented 1
- so that it read from the pipe one byte at a time or it would have to 1
- employ some method to seek backwards on the pipe. Such functionality is 1
- not defined currently in POSIX.1 {8} and does not exist on all historical 1
- systems. Other utilities, such as head, read, and sh, have similar 1
- properties, so the restriction is described globally in this clause. A 1
- future revision to this standard may require that the standard utilities 1
- leave the file offset in a consistent state for pipes as well as regular 1
- files. 1
-
- The description of conformance documentation about file sizes follows
- many changes of direction by the working group. Originally, there
- appeared a limit, {ED_FILE_MAX}, that hoped to impose a minimum file size
- on ed, which has been historically limited to relatively small files.
- This received objections from various members who said that such a limit
- merely invited sloppy programming; there should be no limits to a
- ``well-written'' ed. Thus, Draft 8 removed the limit and inserted
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 2.11 Utility Description Defaults 187
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- rationale that this meant ed would have to process files of virtually
- unlimited size. (Surprisingly, no objections or comments were received
- about that sentence.) However, in discussing the matter with
- representatives of POSIX.3, it turned out that omitting the limit meant
- that a corresponding test assertion would also be omitted and no test
- suite could legitimately stress ed with large files. It quickly became
- clear that restrictions applied to other utilities as well and a solution
- was needed.
-
- It is not possible for this standard to judge which utilities are in the
- category with arbitrary file size limits; this would impose too much on
- implementors. Therefore, the burden is placed on implementors to
- publicly document any limitations and the resulting pressure in the
- marketplace should keep most implementations adequate for most portable
- applications. Typically, larger systems would have larger limits than
- smaller systems, but since price typically follows function, the user can
- select a machine that handles his/her problems reasonably given such
- information. The working group considered adding a limit in 2.13.1 for
- every file-oriented utility, but felt these limits would not actually be
- used by real applications and would reduce consensus. This is
- particularly true for utilities, such as possibly awk or yacc, that might
- have rather complex limits not directly related to the actual file size.
-
- The definition of _t_e_x_t _f_i_l_e (see 2.2.2.151) is strictly enforced for
- input to the standard utilities; very few of them list exceptions to the
- undefined results called for here. (Of course, ``undefined'' here does
- not mean that existing implementations necessarily have to change to
- start indicating error conditions. Conforming applications cannot rely
- on implementations succeeding or failing when nontext files are used.)
-
- The utilities that allow line continuation are generally those that
- accept input languages, rather than pure data. It would be unusual for
- an input line of this type to exceed {LINE_MAX} bytes and unreasonable to
- require that the implementation allow unlimited accumulation of multiple
- lines, each of which could reach {LINE_MAX}. Thus, for a portable
- application the total of all the continued lines in a set cannot exceed
- {LINE_MAX}.
-
- The format description is intended to be sufficiently rigorous to allow
- other applications to generate these input files. However, since
- <blank>s can legitimately be included in some of the fields described by
- the standard utilities, particularly in locales other than the POSIX
- Locale, this intent is not always realized.
-
- END_RATIONALE
-
-
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 188 2 Terminology and General Requirements
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- 2.11.5.3 Environment Variables
-
- The Environment Variables subclause lists what variables affect the
- utility's execution.
-
- The entire manner in which environment variables described in this
- standard affect the behavior of each utility is described in the
- Environment Variables subclause for that utility, in conjunction with the
- global effects of the LANG and LC_ALL environment variables described in
- 2.6. The existence or value of environment variables described in this
- standard shall not otherwise affect the specified behavior of the
- standard utilities. Any effects of the existence or value of environment
- variables not described by this standard upon the standard utilities are
- unspecified.
-
- For those standard utilities that use environment variables as a means
- for selecting a utility to execute (such as CC in make), the string
- provided to the utility shall be subjected to the path search described
- for PATH in 2.6.
-
- Default Behavior: When this subclause is listed as ``None,'' it means
- that the behavior of the utility is not directly affected by environment
- variables described by this standard when the utility is used as
- described by this standard.
-
- BEGIN_RATIONALE
-
- 2.11.5.3.1 Environment Variables Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a
- _p_a_r_t _o_f _P_1_0_0_3._2)
-
- The global default text about the PATH search is overkill in this version
- of POSIX.2 (prior to the UPE) because only one of the standard utilities
- specifies variables in this way--make's $(CC), $(LEX), etc. It is
- described here mostly in anticipation of its heavier usage in POSIX.2a.
- The description of PATH indicates separately that names including slashes
- do not apply, so they do not apply here either.
-
- END_RATIONALE
-
-
- 2.11.5.4 Asynchronous Events
-
- The Asynchronous Events subclause lists how the utility reacts to such
- events as signals and what signals are caught.
-
- Default Behavior: When this subclause is listed as ``Default,'' or it
- refers to ``the standard action for all other signals; see 2.11.5.4,'' it
- means that the action taken as a result of the signal shall be one of the
- following:
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 2.11 Utility Description Defaults 189
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- (1) The action is that inherited from the parent according to the
- rules of inheritance of signal actions defined in POSIX.1 {8}
- (see 2.9.1), or
-
- (2) When no action has been taken to change the default, the default
- action is that specified by POSIX.1 {8}, or
-
- (3) The result of the utility's execution is as if default actions
- had been taken.
-
- BEGIN_RATIONALE
-
- 2.11.5.4.1 Asynchronous Events Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t
- _o_f _P_1_0_0_3._2)
-
- Because there is no language prohibiting it, a utility is permitted to
- catch a signal, perform some additional processing (such as deleting
- temporary files), restore the default signal action (or action inherited
- from the parent process) and resignal itself.
-
- END_RATIONALE
-
-
- 2.11.6 External Effects
-
- The External Effects subclause describes the effects of the utility on
- the operational environment, including the file system. There are three
- subclauses that contain all the substantive information about external
- effects; because of this, this level of header is usually left blank.
-
- Certain of the standard utilities describe how they can invoke other
- utilities or applications, such as by passing a command string to the
- command interpreter. The external effects of such invoked utilities are
- not described in the subclause concerning the standard utility that
- invokes them.
-
-
- 2.11.6.1 Standard Output
-
- The Standard Output subclause describes the standard output of the
- utility. This subclause is frequently merely a reference to the
- following subclause, Output Files, because many utilities treat standard
- output and output files in the same manner.
-
- Use of a terminal for standard output may cause any of the standard
- utilities that write standard output to stop when used in the background.
- For this reason, applications should not use interactive features in
- scripts to be placed in the background.
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 190 2 Terminology and General Requirements
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- Record formats are described in a notation similar to that used by the C
- language function, _p_r_i_n_t_f(). See 2.12 for a description of this
- notation.
-
- The specified standard output of the standard utilities shall not depend
- on the existence or value of the environment variables defined in this
- standard, except as provided by this standard.
-
- Default Behavior: When this subclause is listed as ``None,'' it means
- that the standard output shall not be written when the utility is used as
- described by this standard.
-
- BEGIN_RATIONALE
-
- 2.11.6.1.1 Standard Output Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f
- _P_1_0_0_3._2)
-
- This subclause was globally renamed from Standard Output Format in
- previous drafts to better reflect its role in describing the existence
- and usage of the file, in addition to its format.
-
- The format description is intended to be sufficiently rigorous to allow
- post-processing of output by other programs, particularly by an awk or
- lex parser.
-
- END_RATIONALE
-
-
- 2.11.6.2 Standard Error
-
- The Standard Error subclause describes the standard error output of the
- utility. Only those messages that are purposely sent by the utility are
- described.
-
- Use of a terminal for standard error may cause any of the standard
- utilities that write standard error output to stop when used in the
- background. For this reason, applications should not use interactive
- features in scripts to be placed in the background.
-
- The format of diagnostic messages for most utilities is unspecified, but
- the language and cultural conventions of diagnostic and informative
- messages whose format is unspecified by this standard should be affected
- by the setting of LC_MESSAGES.
-
- The specified standard error output of standard utilities shall not
- depend on the existence or value of the environment variables defined in
- this standard, except as provided by this standard.
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 2.11 Utility Description Defaults 191
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- Default Behavior: When this subclause is listed as ``Used only for
- diagnostic messages,'' it means that, unless otherwise stated, the
- diagnostic messages shall be sent to the standard error only when the
- exit status is nonzero and the utility is used as described by this
- standard.
-
- When this subclause is listed as ``None,'' it means that the standard
- error shall not be used when the utility is used as described in this
- standard.
-
- BEGIN_RATIONALE
-
- 2.11.6.2.1 Standard Error Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f
- _P_1_0_0_3._2)
-
- This subclause was globally renamed from Standard Error Format in
- previous drafts to better reflect its role in describing the existence
- and usage of the file, in addition to its format.
-
- This subclause does not describe error messages that refer to incorrect
- operation of the utility. Consider a utility that processes program
- source code as its input. This subclause is used to describe messages
- produced by a correctly operating utility that encounters an error in the
- program source code on which it is processing. However, a message
- indicating that the utility had insufficient memory in which to operate
- would not be described.
-
- Some compilers have traditionally produced warning messages without
- returning a nonzero exit status; these are specifically noted in their
- subclauses. Other utilities are expected to remain absolutely quiet on
- the standard error if they want to return zero, unless the implementation
- provides some sort of extension to increase the verbosity or debugging
- level.
-
- The format descriptions are intended to be sufficiently rigorous to allow
- post-processing of output by other programs.
-
- END_RATIONALE
-
-
- 2.11.6.3 Output Files
-
- The Output Files subclause describes the files created or modified by the
- utility. Temporary or system files that are created for internal usage
- by this utility or other parts of the implementation (spool, log, audit
- files, etc.) are not described in this, or any, subclause. The
- utilities creating such files and the names of such files are
- unspecified. If applications are written to use temporary or
- intermediate files, they should use the TMPDIR environment variable, if
- it is set and represents an accessible directory, to select the location 1
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 192 2 Terminology and General Requirements
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- of temporary files. 1
-
- Implementations shall ensure that temporary files, when used by the
- standard utilities, are named so that different utilities or multiple
- instances of the same utility can operate simultaneously without regard
- to their working directories, or any other process characteristic other
- than process ID. There are two exceptions to this requirement:
-
- (1) Resources for temporary files other than the namespace (for
- example, disk space, available directory entries, or number of
- processes allowed) are not guaranteed.
-
- (2) Certain standard utilities generate output files that are
- intended as input for other utilities, (for example, lex
- generates lex.yy.c) and these cannot have unique names. These
- cases are explicitly identified in the descriptions of the
- respective utilities.
-
- Any temporary files created by the implementation shall be removed by the
- implementation upon a utility's successful exit, exit because of errors,
- or before termination by any of the SIGHUP, SIGINT, or SIGTERM signals,
- unless specified otherwise by the utility description.
-
- Record formats are described in a notation similar to that used by the C
- language function, _p_r_i_n_t_f(). See 2.12 for a description of this
- notation.
-
- Default Behavior: When this subclause is listed as ``None,'' it means
- that no files are created or modified as a consequence of direct action
- on the part of the utility when the utility is used as described by this
- standard. However, the utility may create or modify system files, such
- as log files, that are outside of the utility's normal execution
- environment.
-
- BEGIN_RATIONALE
-
- 2.11.6.3.1 Output Files Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f
- _P_1_0_0_3._2)
-
- This subclause was globally renamed from Output File Formats in previous
- drafts to better reflect its role in describing the existence and usage
- of the files, in addition to their format.
-
- The format description is intended to be sufficiently rigorous to allow
- post-processing of output by other programs, particularly by an awk or
- lex parser.
-
- Receipt of the SIGQUIT signal should generally cause termination (unless
- in some debugging mode) that would bypass any attempted recovery actions.
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 2.11 Utility Description Defaults 193
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- END_RATIONALE
-
-
- 2.11.7 Extended Description
-
- The Extended Description subclause provides a place for describing the
- actions of very complicated utilities, such as text editors or language
- processors, which typically have elaborate command languages.
-
- Default Behavior: When this subclause is listed as ``None,'' no further
- description is necessary.
-
-
- 2.11.8 Exit Status
-
- The Exit Status subclause describes the values the utility shall return
- to the calling program, or shell, and the conditions that cause these
- values to be returned. Usually, utilities return zero for successful
- completion and values greater than zero for various error conditions. If
- specific numeric values are listed in this subclause, conforming
- implementations shall use those values for the errors described. In some
- cases, status values are listed more loosely, such as ``>0.'' A Strictly
- Conforming POSIX.2 Application shall not rely on any specific value in
- the range shown and shall be prepared to receive any value in the range.
-
- Unspecified error conditions may be represented by specific values not
- listed in the standard.
-
- BEGIN_RATIONALE
-
-
- 2.11.8.1 Exit Status Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f
- _P_1_0_0_3._2)
-
- Note the additional discussion of exit status values in 3.8.2. It 1
- describes requirements for returning exit values > 125. 1
-
- A utility may list zero as a successful return, 1 as a failure for a
- specific reason, and >1 as ``an error occurred.'' In this case,
- unspecified conditions may cause a 2 or 3, or other value, to be
- returned. A Strictly Conforming POSIX.2 Application should be written so
- that it tests for successful exit status values (zero in this case),
- rather than relying upon the single specific error value listed in the
- standard. In that way, it will have maximum portability, even on
- implementations with extensions.
-
- The working group is aware that the general nonenumeration of errors
- makes it difficult to write test suites that test the _i_n_c_o_r_r_e_c_t operation
- of utilities. There are some historical implementations that have
- expended effort to provide detailed status messages and a helpful
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 194 2 Terminology and General Requirements
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- environment to bypass or explain errors, such as prompting, retrying, or
- ignoring unimportant syntax errors; other implementations have not.
- Since there is no realistic way to mandate system behavior in cases of
- undefined application actions or system problems--in a manner acceptable
- to all cultures and environments--attention has been limited to the
- correct operation of utilities by the conforming application.
- Furthermore, the portable application does not need detailed information
- concerning errors that it caused through incorrect usage or that it
- cannot correct anyway. The high degree of competition in the emerging
- POSIX marketplace should ensure that users requiring friendly, resilient
- environments will be able to purchase such without detailed specification
- in this standard.
-
- There is no description of defaults for this subclause because all of the
- standard utilities specify something (or explicitly state
- ``Unspecified'') for Exit Status.
-
- END_RATIONALE
-
-
- 2.11.9 Consequences of Errors
-
- The Consequences of Errors subclause describes the effects on the
- environment, file systems, process state, etc., when error conditions
- occur. It does not describe error messages produced or exit status
- values used.
-
- The many reasons for failure of a utility are generally not specified by
- the utility descriptions. Utilities may terminate prematurely if they
- encounter: invalid usage of options, arguments, or environment
- variables; invalid usage of the complex syntaxes expressed in Extended
- Description subclauses; difficulties accessing, creating, reading, or
- writing files; or, difficulties associated with the privileges of the
- process.
-
- The following shall apply to each utility, unless otherwise stated:
-
- - If the requested action cannot be performed on an operand
- representing a file, directory, user, process, etc., the utility
- shall issue a diagnostic message to standard error and continue
- processing the next operand in sequence, but the final exit status
- shall be returned as nonzero.
-
- - If the requested action characterized by an option or option-
- argument cannot be performed, the utility shall issue a diagnostic
- message to standard error and the exit status returned shall be
- nonzero.
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 2.11 Utility Description Defaults 195
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- - When an unrecoverable error condition is encountered, the utility
- shall exit with a nonzero exit status.
-
- - A diagnostic message shall be written to standard error whenever an
- error condition occurs.
-
- Default Behavior: When this subclause is listed as ``Default,'' it means
- that any changes to the environment are unspecified.
-
- BEGIN_RATIONALE
-
-
- 2.11.9.1 Consequences of Errors Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t
- _o_f _P_1_0_0_3._2)
-
- When a utility encounters an error condition several actions are
- possible, depending on the severity of the error and the state of the
- utility. Included in the possible actions of various utilities are:
- deletion of temporary or intermediate work files; deletion of incomplete
- files; validity checking of the file system or directory.
-
- In Draft 9, most of the Consequences of Errors subclauses were changed to
- ``Default.'' This is due to the more elaborate description of the
- default case now carried in this subclause and the fact that most of the
- standard utilities actually use that default.
-
- END_RATIONALE
-
- BEGIN_RATIONALE
-
-
- 2.11.10 Rationale
-
- This subclause provides historical perspective and justification of
- working group actions concerning the utility.
-
- _E_x_a_m_p_l_e_s_,__U_s_a_g_e
-
- This subclause provides examples and usage of the utility. In some cases
- certain characters are interpreted as special characters to the shell.
- In the rest of the standard, these characters are shown without escape
- characters or quoting (see 3.2). In all examples, however, quoting has
- been used, showing how sample commands (utility names combined with
- arguments) could be passed correctly to a shell (see sh in 4.56) or as a
- string to the _s_y_s_t_e_m() function.
-
-
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 196 2 Terminology and General Requirements
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- _H_i_s_t_o_r_y__o_f__D_e_c_i_s_i_o_n_s__M_a_d_e
-
- This subclause provides historical perspective for decisions that were
- made.
-
- _U_n_r_e_s_o_l_v_e_d__O_b_j_e_c_t_i_o_n_s
-
- These subclauses were removed from Draft 10. The Unresolved Objections
- are maintained in a separate list and do not meet ISO editing
- requirements for an informative annex.
-
-
- 2.11.10.1 Rationale Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2)
-
- The Rationale subclauses will be moved to Annex E in the final POSIX.2.
- Some of the subheadings may be collapsed in that document; in these
- drafts the working group has not always been very rigorous about what is
- a description of usage versus a history of decisions made, for example.
- The final rationale will de-emphasize the chronological aspects of
- working group decisions.
-
- END_RATIONALE
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 2.11 Utility Description Defaults 197
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- 2.12 File Format Notation
-
- The Standard Input, Standard Output, Standard Error, Input Files, and
- Output Files subclauses of the utility descriptions, when provided, use a
- syntax to describe the data organization within the files, when that
- organization is not otherwise obvious. The syntax is similar to that
- used by the C language _p_r_i_n_t_f() function, as described in this clause.
- When used in Standard Input or Input Files subclauses of the utility
- descriptions, this syntax describes the format that could have been used
- to write the text to be read, not a format that could be used by the C
- language _s_c_a_n_f() function to read the input file.
-
- The description of an individual record is as follows:
-
- "<_f_o_r_m_a_t>", [ <_a_r_g_1>, <_a_r_g_2>, ..., <_a_r_g_n> ]
-
- The _f_o_r_m_a_t is a character string that contains three types of objects
- defined below:
-
- _c_h_a_r_a_c_t_e_r_s Characters that are not _e_s_c_a_p_e _s_e_q_u_e_n_c_e_s or _c_o_n_v_e_r_s_i_o_n
- _s_p_e_c_i_f_i_c_a_t_i_o_n_s, as described below, shall be copied to the
- output.
-
- _e_s_c_a_p_e _s_e_q_u_e_n_c_e_s
- Represent nongraphic characters.
-
- _c_o_n_v_e_r_s_i_o_n _s_p_e_c_i_f_i_c_a_t_i_o_n_s
- Specifies the output format of each argument. (See
- below.)
-
- The following characters have the following special meaning in the format
- string:
-
- " " (An empty character position.) One or more <blank>
- characters.
-
- W Exactly one <space> character.
-
- The escape-sequences in Table 2-15 depict the associated action on
- display devices capable of the action.
-
- Each conversion specification shall be introduced by the percent-sign
- character (%). After the character %, the following shall appear in
- sequence:
-
- _f_l_a_g_s Zero or more _f_l_a_g_s, in any order, that modify the meaning
- of the conversion specification.
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 198 2 Terminology and General Requirements
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
-
- Table 2-15 - Escape Sequences
- __________________________________________________________________________________________________________________________________________________
- Escape Represents
- Sequence Character Terminal Action
- _________________________________________________________________________
-
- \\ backslash None.
- \a <alert> Attempts to alert the user through
- audible or visible notification.
- \b <backspace> Moves the printing position to one
- column before the current position,
- unless the current position is the
- start of a line.
- \f <form-feed> Moves the printing position to the
- initial printing position of the next
- logical page.
- \n <newline> Moves the printing position to the
- start of the next line.
- \r <carriage-return> Moves the printing position to the
- start of the current line.
- \t <tab> Moves the printing position to the
- next tab position on the current
- line. If there are no more tab
- positions left on the line, the
- behavior is undefined.
- \v <vertical tab> Moves the printing position to the
- start of the next vertical tab
- position. If there are no more
- vertical tab positions left on the
- page, the behavior is undefined.
- __________________________________________________________________________________________________________________________________________________
-
-
- _f_i_e_l_d _w_i_d_t_h An optional string of decimal digits to specify a minimum
- _f_i_e_l_d _w_i_d_t_h. For an output field, if the converted value
- has fewer bytes than the field width, it shall be padded
- on the left [or right, if the left-adjustment flag (-),
- described below, has been given] to the field width.
-
- _p_r_e_c_i_s_i_o_n Gives the minimum number of digits to appear for the d, o,
- i, u, x, or X conversions (the field shall be padded with
- leading zeros), the number of digits to appear after the
- radix character for the e and f conversions, the maximum
- number of significant digits for the g conversion; or the
- maximum number of bytes to be written from a string in s
- conversion. The precision shall take the form of a period
- (.) followed by a decimal digit string; a null digit
- string shall be treated as zero.
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 2.12 File Format Notation 199
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- _c_o_n_v_e_r_s_i_o_n _c_h_a_r_a_c_t_e_r_s
- A conversion character (see below) that indicates the type
- of conversion to be applied.
-
- The _f_l_a_g characters and their meanings are:
-
- - The result of the conversion shall be left-justified
- within the field.
-
- + The result of a signed conversion always shall begin with
- a sign (+ or -).
-
- <space> If the first character of a signed conversion is not a
- sign, a <space> shall be prefixed to the result. This
- means that if the <space> and + flags both appear, the
- <space> flag shall be ignored.
-
- # The value is to be converted to an ``alternate form.''
- For c, d, i, u, and s conversions, the behavior is
- undefined. For o conversion, it shall increase the
- precision to force the first digit of the result to be a
- zero. For x or X conversion, a nonzero result shall have
- 0x or 0X prefixed to it, respectively. For e, E, f, g and
- G conversions, the result shall always contain a radix
- character, even if no digits follow the radix character.
- For g and G conversions, trailing zeroes shall not be
- removed from the result as they usually are.
-
- 0 For d, i, o, u, x, X, e, E, f, g, and G conversions,
- leading zeroes (following any indication of sign or base)
- shall be used to pad to the field width; no space padding
- shall be performed. If the 0 and - flags both appear, the
- 0 flag shall be ignored. For d, i, o, u, x, and X
- conversions, if a precision is specified, the 0 flag shall
- be ignored. For other conversions, the behavior is
- undefined.
-
- Each conversion character shall result in fetching zero or more
- arguments. The results are undefined if there are insufficient arguments
- for the format. If the format is exhausted while arguments remain, the
- excess arguments shall be ignored.
-
- The _c_o_n_v_e_r_s_i_o_n _c_h_a_r_a_c_t_e_r_s and their meanings are:
-
- d,i,o,u,x,X The integer argument shall be written as signed decimal (d
- or i), unsigned octal (o), unsigned decimal (u), or
- unsigned hexadecimal notation (x and X). The d and i
- specifiers shall convert to signed decimal in the style
- [-]_d_d_d_d. The x conversion shall use the numbers and
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 200 2 Terminology and General Requirements
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- letters 0123456789abcdef and the X conversion shall use
- the numbers and letters 0123456789ABCDEF. The _p_r_e_c_i_s_i_o_n
- component of the argument shall specify the minimum number
- of digits to appear. If the value being converted can be
- represented in fewer digits than the specified minimum, it
- shall be expanded with leading zeroes. The default
- precision shall be 1. The result of converting a zero
- value with a precision of 0 shall be no characters. If
- both the field width and precision are omitted, the
- implementation may precede and/or follow numeric arguments
- of types d, i, and u with <blank>s; arguments of type o
- (octal) may be preceded with leading zeroes.
-
- f The floating point number argument shall be written in
- decimal notation in the style "[-]_d_d_d._d_d_d", where the
- number of digits after the radix character (shown here as
- a decimal point) shall be equal to the _p_r_e_c_i_s_i_o_n
- specification. The LC_NUMERIC locale category shall
- determine the radix character to use in this format. If
- the _p_r_e_c_i_s_i_o_n is omitted from the argument, six digits
- shall be written after the radix character; if the
- _p_r_e_c_i_s_i_o_n is explicitly 0, no radix character shall
- appear.
-
- e,E The floating point number argument shall be written in the
- style "[-]_d._d_d_d_e+__d_d" (the symbol +_ indicates either a plus
- or minus sign), where there is one digit before the radix
- character (shown here as a decimal point) and the number
- of digits after it is equal to the precision. The
- LC_NUMERIC locale category shall determine the radix
- character to use in this format. When the precision is
- missing, six digits shall be written after the radix
- character; if the precision is 0, no radix character shall
- appear. The E conversion character shall produce a number
- with E instead of e introducing the exponent. The
- exponent always shall contain at least two digits.
- However, if the value to be written requires an exponent
- greater than two digits, additional exponent digits shall
- be written as necessary.
-
- g,G The floating point number argument shall be written in
- style f or e (or in style E in the case of a G conversion
- character), with the precision specifying the number of
- significant digits. The style used depends on the value
- converted: style e shall be used only if the exponent
- resulting from the conversion is less than -4 or greater
- than or equal to the precision. Trailing zeroes shall be
- removed from the result. A radix character shall appear
- only if it is followed by a digit.
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 2.12 File Format Notation 201
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- c The integer argument shall be converted to an _u_n_s_i_g_n_e_d
- _c_h_a_r and the resulting byte shall be written.
-
- s The argument shall be taken to be a string and bytes from
- the string shall be written until the end of the string or
- the number of bytes indicated by the _p_r_e_c_i_s_i_o_n
- specification of the argument is reached. If the
- precision is omitted from the argument, it shall be taken
- to be infinite, so all bytes up to the end of the string
- shall be written.
-
- % Write a % character; no argument shall be converted.
-
- In no case does a nonexistent or insufficient _f_i_e_l_d _w_i_d_t_h cause
- truncation of a field; if the result of a conversion is wider than the
- field width, the field shall be simply expanded to contain the conversion
- result.
-
- BEGIN_RATIONALE
-
-
- 2.12.1 File Format Notation Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f
- _P_1_0_0_3._2)
-
- This clause was originally derived from the description of _p_r_i_n_t_f() in
- the _S_V_I_D, but it has been updated following the publication of the
- C Standard {7}. It is not identical to the C Standard's {7} _p_r_i_n_t_f(), as
- it deals with integers as being essentially one type, disregarding
- possible internal differences between _i_n_t, _s_h_o_r_t, and _l_o_n_g. It has also
- had some of the internal C language dependencies removed (such as the
- requirement for null-terminated strings).
-
- This standard provides a rigorous description of the format of utility
- input and output files. It is the intention of this standard that these
- descriptions be adequate sources of information so that portable
- applications can use other utilities such as lex or awk to reliably parse
- the output of these utilities as their input in, say a pipeline.
-
- The notation for spaces allows some flexibility for application output.
- Note that an empty character position in _f_o_r_m_a_t represents one or more
- <blank> characters on the output (not _w_h_i_t_e _s_p_a_c_e, which can include
- <newline>s). Therefore, another utility that reads that output as its
- input must be prepared to parse the data using _s_c_a_n_f(), awk, etc. The W
- character is used when exactly one <space> is output.
-
- The treatment of integers and spaces is different from the real _p_r_i_n_t_f(),
- in that they can be surrounded with <blank>_s. This was done so that,
- given a format such as:
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 202 2 Terminology and General Requirements
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- "%d\n", <_f_o_o>
-
- the implementation could use a real _p_r_i_n_t_f() such as
-
- printf("%6d\n", foo);
-
- and still conform. It would have been possible for the standard to use
- "%6d\n", but it would have been difficult to pick a number that would
- have pleased everyone. This notation is thus somewhat like _s_c_a_n_f() in
- addition to _p_r_i_n_t_f().
-
- The _p_r_i_n_t_f() function was chosen as a model as most of the working group
- was familiar with it and it was thought that many of the readers would be
- as well.
-
- One difference from the C function _p_r_i_n_t_f() is that the l and h
- conversion characters are not used. As expressed by this standard, there
- is no differentiation between decimal values for _i_n_ts versus _l_o_n_gs versus
- _s_h_o_r_ts. The specifications %d or %i should be interpreted as an
- arbitrary length sequence of digits. Also, no distinction is made
- between single precision and double precision numbers (_f_l_o_a_t/_d_o_u_b_l_e in
- C). These are simply referred to as floating point numbers.
-
- Many of the output descriptions in this standard use the term _l_i_n_e, such
- as:
-
- "%s", <_i_n_p_u_t _l_i_n_e>
-
- Since the definition of _l_i_n_e includes the trailing <newline> character
- already, there is no need to include a "\n" in the format; a double
- <newline> would otherwise result.
-
- In the language at the end of the clause:
-
- ``In no case does a nonexistent or insufficient _f_i_e_l_d _w_i_d_t_h
- cause truncation of a field; ...''
-
- the term ``field width'' should not be confused with the term
- ``precision'' used in the description of %s.
-
- Examples:
-
- To represent the output of a program that prints a date and time in the
- form Sunday, July 3, 10:02, where <_w_e_e_k_d_a_y> and <_m_o_n_t_h> are strings:
-
- "%s,W%sW%d,W%d:%.2d\n", <_w_e_e_k_d_a_y>, <_m_o_n_t_h>, <_d_a_y>, <_h_o_u_r>,
- <_m_i_n>
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 2.12 File Format Notation 203
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- To show J written to 5 decimal places:
-
- "piW=W%.5f\n", <_v_a_l_u_e _o_f J>
-
- To show an input file format consisting of five colon-separated fields:
-
- "%s:%s:%s:%s:%s\n", <_a_r_g_1>, <_a_r_g_2>, <_a_r_g_3>, <_a_r_g_4>, <_a_r_g_5>
-
- END_RATIONALE
-
-
-
- 2.13 Configuration Values
-
-
- 2.13.1 Symbolic Limits
-
- This clause lists magnitude limitations imposed by a specific
- implementation. The braces notation, {LIMIT}, is used in this standard
- to indicate these values, but the braces are not part of the name. The
- values specified in Table 2-16 represent the lowest values conforming
- implementations shall provide; and consequently, the largest values on
- which an application can rely without further enquiries, as described
- below. These values shall be accessible to applications via the getconf
- utility (see 4.26) and through the interfaces described in 7.8.2, [such
- as _s_y_s_c_o_n_f() in the C binding]. The literal names shown in the table
- apply only to the getconf utility; the high-level-language binding shall
- describe the exact form of each name to be used by the interfaces in that
- binding.
-
- Implementations may provide more liberal, or less restrictive, values
- than shown in Table 2-16. These possibly more liberal values are
- accessible using the symbols in Table 2-17.
-
- The functions in 7.8.2 [such as _s_y_s_c_o_n_f() in the C binding] or the
- getconf utility shall return the value of each symbol on each specific
- implementation. The value so retrieved shall be the largest, or most
- liberal, value that shall be available throughout the session lifetime,
- as determined at session creation. The literal names shown in the table
- apply only to the getconf utility; the high-level-language binding shall
- describe the exact form of each name to be used by the interfaces in that
- binding.
-
- All numerical limits defined by POSIX.1 {8}, such as {PATH_MAX}, also
- apply to this standard. (See POSIX.1 {8} 2.8.) All the utilities
- defined by this standard are implicitly limited by these values, unless
- otherwise noted in the utility descriptions.
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 204 2 Terminology and General Requirements
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
-
- Table 2-16 - Utility Limit Minimum Values
- __________________________________________________________________________________________________________________________________________________
- Name Description Value
- ____________________________________________________________________
-
- {POSIX2_BC_BASE_MAX} The maximum _o_b_a_s_e value 99
- allowed by the bc utility.
- {POSIX2_BC_DIM_MAX} The maximum number of elements 2048
- permitted in an array by the
- bc utility.
- {POSIX2_BC_SCALE_MAX} The maximum _s_c_a_l_e value 99
- allowed by the bc utility.
- {POSIX2_BC_STRING_MAX} The maximum length of a string 1000
- constant accepted by the bc
- utility.
- {POSIX2_COLL_WEIGHTS_MAX} The maximum number of weights 2
- that can be assigned to an
- entry of the LC_COLLATE order
- keyword in the locale
- definition file; see
- 2.5.2.2.3.
- {POSIX2_EXPR_NEST_MAX} The maximum number of 32
- expressions that can be nested
- within parentheses by the expr
- utility.
- {POSIX2_LINE_MAX} Unless otherwise noted, the 2048
- maximum length, in bytes, of a
- utility's input line (either
- standard input or another
- file), when the utility is
- described as processing text
- files. The length includes
- room for the trailing
- <newline>.
- {POSIX2_RE_DUP_MAX} The maximum number of repeated 255
- occurrences of a regular
- expression permitted when
- using the interval notation
- \{_m,_n\}; see 2.8.3.3.
- {POSIX2_VERSION} This value indicates the 199??? 11
- version of the utilities in 1
- this standard that are 1
- provided by the 1
- implementation. It will 1
- change with each published 1
- version of this standard. 1
- __________________________________________________________________________________________________________________________________________________
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 2.13 Configuration Values 205
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
-
- Table 2-17 - Symbolic Utility Limits
- __________________________________________________________________________________________________________________________________________________
- Minimum
- Name Description Value
- ____________________________________________________________________
-
- {BC_BASE_MAX} The maximum _o_b_a_s_e value {POSIX2_BC_BASE_MAX}
- allowed by the bc
- utility.
- {BC_DIM_MAX} The maximum number of {POSIX2_BC_DIM_MAX}
- elements permitted in
- an array by the bc
- utility.
- {BC_SCALE_MAX} The maximum _s_c_a_l_e value {POSIX2_BC_SCALE_MAX}
- allowed by the bc
- utility.
- {BC_STRING_MAX} The maximum length of a {POSIX2_BC_STRING_MAX}
- string constant
- accepted by the bc
- utility.
- {COLL_WEIGHTS_MAX} The maximum number of {POSIX2_COLL_WEIGHTS_MAX}
- weights that can be
- assigned to an entry of
- the LC_COLLATE order
- keyword in the locale
- definition file; see
- 2.5.2.2.3.
- {EXPR_NEST_MAX} The maximum number of {POSIX2_EXPR_NEST_MAX}
- expressions that can be
- nested within
- parentheses by the expr
- utility.
- {LINE_MAX} Unless otherwise noted, {POSIX2_LINE_MAX}
- the maximum length, in
- bytes, of a utility's
- input line (either
- standard input or
- another file), when the
- utility is described as
- processing text files.
- The length includes
- room for the trailing
- <newline>.
- The maximum number of
- repeated occurrences of
- a regular expression
- permitted when using
- the interval notation
- \{_m,_n\}; see 2.8.3.3.
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 206 2 Terminology and General Requirements
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- {RE_DUP_MAX} {POSIX2_RE_DUP_MAX}
-
-
-
-
-
- __________________________________________________________________________________________________________________________________________________
-
-
- It is not guaranteed that the application can in fact push a value to the
- implementation's specified limit in any given case, or at all, as a lack
- of virtual memory or other resources may prevent this. The limit value
- indicates only that the implementation does not specifically impose any
- arbitrary, more restrictive limit.
-
- BEGIN_RATIONALE
-
-
- 2.13.1.1 Symbolic Limits Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f
- _P_1_0_0_3._2)
-
- This clause grew out of an idea that originated in POSIX.1 {8}, in the
- form of _s_y_s_c_o_n_f() and _p_a_t_h_c_o_n_f(). (In fact, the same person wrote the
- original text for both standards.) The idea is that a Strictly
- Conforming POSIX.2 Application can be written to use the most restrictive
- values that a minimal system can provide, but it shouldn't have to. The
- values shown in Table 2-17 represent compromises so that some vendors can
- use historically-limited versions of UNIX system utilities. They are the
- highest values that Strictly Conforming POSIX.2 Applications or
- Conforming POSIX.2 Applications can assume, given no other information.
-
- However, by using getconf or _s_y_s_c_o_n_f(), the elegant application can
- tailor itself to the more liberal values on some of the specific
- instances of specific implementations.
-
- There is no explicitly-stated requirement that an implementation provide
- finite limits for any of these numeric values; the implementation is free
- to provide essentially unbounded capabilities (where it makes sense),
- stopping only at reasonable points such as {ULONG_MAX} (from the
- C Standard {7} via POSIX.1 {8}). Therefore, applications desiring to
- tailor themselves to the values on a particular implementation need to be
- ready for possibly huge values; it may not be a good idea to blindly
- allocate a buffer for an input line based on the value of {LINE_MAX}, for
- instance. However, unlike POSIX.1 {8}, there is no set of limits in this
- standard that return a special indication meaning ``unbounded.'' The
- implementation should always return an actual number, even if the number
- is very large.
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 2.13 Configuration Values 207
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- The statement
-
- ``It is not guaranteed that the application ...
-
- is an indication that many of these limits are designed to ensure that
- implementors design their utilities without arbitrary constraints related
- to unimaginative programming. There are certainly conditions under which
- combinations of options can cause failures that would not render an
- implementation nonconforming. For example, {EXPR_NEST_MAX} and {ARG_MAX}
- could collide when expressions are large; combinations of {BC_SCALE_MAX}
- and {BC_DIM_MAX} could exceed virtual memory.
-
- In POSIX.2, the notion of a limit being guaranteed for the process
- lifetime, as it is in POSIX.1 {8}, is not as useful to a shell script.
- The getconf utility is probably a process itself, so the guarantee would
- be valueless. Therefore, POSIX.2 requires the guarantee to be for the
- session lifetime. This will mean that many vendors will either return
- very conservative values or possibly implement getconf as a built-in.
-
- It may seem confusing to have limits that apply only to a single utility
- grouped into one global clause. However, the alternative, which would be
- to disperse them out into their utility description clauses, would cause
- great difficulty when _s_y_s_c_o_n_f() and getconf were described. Therefore,
- the working group chose the global approach.
-
- Each language binding could provide symbol names that are slightly
- different than are shown here. For example, the C binding prefixes the
- symbols with a leading underscore.
-
- The following comments describe selection criteria for the symbols and
- their values.
-
- {ARG_MAX}
- This is defined by POSIX.1 {8}. Unfortunately, it is very
- difficult for a portable application to deal with this value, as
- it does not know how much of its argument space is being
- consumed by the user's environment variables.
-
- {BC_BASE_MAX}
- {BC_DIM_MAX}
- {BC_SCALE_MAX}
- These were originally one value, {BC_SCALE_MAX}, but it was
- unreasonable to link all three concepts into one limit.
-
- {CHILD_MAX}
- This is defined by POSIX.1 {8}.
-
- {CUT_FIELD_MAX}
- This value was removed from an earlier draft. It represented
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 208 2 Terminology and General Requirements
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- the maximum length of the _l_i_s_t argument to the cut -c or -f
- options. Since the length is now unspecified, the utility
- should have to deal with arbitrarily long lists, as long as
- {ARG_MAX} is not exceeded.
-
- {CUT_LINE_MAX}
- This value was removed from an earlier draft. Historical cuts
- have had input line limits of 1024; this removal therefore
- mandates that a conforming cut shall process files with lines of 1
- unlimited length. 1
-
- {DEPTH_MAX}
- This directory-traversing depth limit (which at one time applied
- to rm and find) was removed from an earlier draft for two major
- reasons:
-
- (1) It could be a security problem if utilities searching for
- files could not descend below a published depth; this
- would be a semi-reliable means of hiding files from the
- administrator.
-
- (2) There is no reason a reasonable implementation should have
- to limit itself in this way.
-
- {ED_FILE_MAX}
- This value was removed from an earlier draft. Historical eds
- have had very small file limits; since {ED_FILE_MAX} is no
- longer specified, implementations have to document the limits as
- described in 2.11. It is recommended that implementations set
- much more reasonable file size limits as they modify ed to deal
- with other features required by POSIX.2.
-
- {ED_LINE_MAX}
- This value was removed from an earlier draft. Historical eds
- have had small input line limits; this removal therefore
- mandates that a conforming ed shall process files with lines of
- length {LINE_MAX}.
-
- {COLL_WEIGHTS_MAX}
- The weights assigned to order can be considered as ``passes''
- through the collation algorithm.
-
- {EXPR_NEST_MAX}
- The value for expression nesting was borrowed from the
- C Standard {7}.
-
- {FIND_DEPTH_MAX}
- This was removed from an earlier draft in favor of a common
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 2.13 Configuration Values 209
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- value, {DEPTH_MAX}.
-
- {FIND_FILESYS_MAX}
- This was removed from an earlier draft. It indicated the limit
- of the number of file systems that find could traverse in its
- search. It was dropped because this standard does not really
- acknowledge the historical nature of separate file systems.
-
- {FIND_NEWER_MAX}
- This value, which allowed find to limit the number of -newer
- operands it processed, was deleted from an earlier draft. It
- was felt to be a vestige of a particular implementation with an
- incorrect programming algorithm that should not limit
- applications.
-
- {JOIN_LINE_MAX}
- This value was removed from an earlier draft. Historical joins
- have had input line limits of 1024; this removal therefore
- mandates that a conforming join shall process files with lines
- of length {LINE_MAX}.
-
- {LINE_MAX}
- This is a global limit that affects all utilities, unless
- otherwise noted. The {MAX_CANON} value from POSIX.1 {8} may
- further limit input lines from terminals. The {LINE_MAX} value
- was the subject of much debate and is a compromise between those
- who wished unlimited lines and those who understood that many
- historical utilities were written with fixed buffers.
- Frequently, utility writers selected the UNIX system constant
- BUFSIZ to allocate these buffers; therefore, some utilities were
- limited to 512 bytes for I/O lines, while others achieved 4096
- or greater.
-
- It should be noted that {LINE_MAX} applies only to input line
- length; there is no requirement in the standard that limits the
- length of output lines. Utilities such as awk, sed, and paste
- could theoretically construct lines longer than any of the input
- lines they received, depending on the options used or the
- instructions from the application. They are not required to
- truncate their output to {LINE_MAX}. It is the responsibility
- of the application to deal with this. If the output of one of
- those utilities is to be piped into another of the standard
- utilities, line lengths restrictions will have to be considered;
- the fold utility, among others, could be used to ensure that
- only reasonable line lengths reach utilities or applications.
-
- {LINK_MAX}
- This is defined by POSIX.1 {8}.
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 210 2 Terminology and General Requirements
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- {LP_LINE_MAX}
- This value was removed from an earlier draft. Since so little
- is being required for the details of the lp utility, it made
- little sense to specify how long its output lines are. Thus,
- implementations of lp will be expected to deal with lines up to
- {LINE_MAX}, but whether those lines print sensibly on every
- device is unspecified.
-
- {MAX_CANON}
- This is defined by POSIX.1 {8}.
-
- {MAX_INPUT}
- This is defined by POSIX.1 {8}.
-
- {NAME_MAX}
- This is defined by POSIX.1 {8}.
-
- {NGROUPS_MAX}
- This is defined by POSIX.1 {8}.
-
- {OPEN_MAX}
- This is defined by POSIX.1 {8}.
-
- {PATH_MAX}
- This is defined by POSIX.1 {8}.
-
- {PIPE_BUF}
- This is defined by POSIX.1 {8}.
-
- {RM_DEPTH_MAX}
- This was removed from an earlier draft in favor of a common
- value, {DEPTH_MAX}.
-
- {RE_DUP_MAX}
- The value selected is consistent with historical practice.
-
- {SED_PATTERN_MAX}
- This symbolic value, the size of the sed pattern space, was
- replaced by a specific value in the sed description. It is
- unlikely that any real application would ever need to access
- this value symbolically.
-
- {SORT_LINE_MAX}
- This was removed from an earlier draft. Now that cut and fold
- can handle unlimited-length input lines, a special long input
- line limit for sort is not needed.
-
- There are different limits associated with command lines and input to
- utilities, depending on the method of invocation. In the case of a C
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 2.13 Configuration Values 211
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- program _e_x_e_c-ing a utility, {ARG_MAX} is the underlying limit. In the
- case of the shell reading a script and _e_x_e_c-ing a utility, {LINE_MAX}
- limits the length of lines the shell is required to process and {ARG_MAX}
- will still be a limit. If a user is entering a command on a terminal to
- the shell, requesting that it invoke the utility, {MAX_INPUT} may
- restrict the length of the line that can be given to the shell to a value
- below {LINE_MAX}.
-
- END_RATIONALE
-
-
- 2.13.2 Symbolic Constants for Portability Specifications
-
-
- Table 2-18 - Optional Facility Configuration Values
- __________________________________________________________________________________________________________________________________________________
- Name Description
- _________________________________________________________________________
-
- {POSIX2_C_BIND} The C language development facilities in
- Annex A support the C Language Bindings
- Option (see Annex B).
- {POSIX2_C_DEV} The system supports the C Language
- Development Utilities Option (see
- Annex A).
- {POSIX2_FORT_DEV} The system supports the FORTRAN
- Development Utilities Option (see
- Annex C).
- {POSIX2_FORT_RUN} The system supports the FORTRAN Runtime
- Utilities Option (see Annex C).
- {POSIX2_LOCALEDEF} The system supports the creation of
- locales as described in 4.35.
- {POSIX2_SW_DEV} The system supports the Software
- Development Utilities Option (see Section
- 6).
- __________________________________________________________________________________________________________________________________________________
-
-
- Table 2-18 lists symbols that can be used by the application to determine
- which optional facilities are present on the implementation. The
- functions defined in 7.8.2 [such as _s_y_s_c_o_n_f()] or the getconf utility can
- be used to retrieve the value of each symbol on each specific
- implementation. The literal names shown in the table apply only to the
- getconf utility; the high-level-language binding shall describe the exact
- form of each name to be used by the interfaces in that binding.
-
- Each of these symbols shall be considered valid names by the
- implementation. Each shall be defined on the system with a value of 1 if
- the corresponding option is supported; otherwise, the symbol shall be
- undefined.
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 212 2 Terminology and General Requirements
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- BEGIN_RATIONALE
-
-
- 2.13.2.1 Symbolic Constants for Portability Specifications Rationale.
- (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2)
-
- When an option is supported, getconf returns a value of 1. For example,
- when C development is supported:
-
- if [ "$(getconf POSIX2_C_DEV)" -eq 1 ]; then
- echo C supported
- fi
-
- The _s_y_s_c_o_n_f() function in the C binding would return 1.
-
- The following comments describe selection criteria for the symbols and
- their values.
-
- {POSIX2_C_BIND}
- {POSIX2_C_DEV}
- {POSIX2_FORT_DEV}
- {POSIX2_SW_DEV}
- These were renamed from _POSIX_* in Draft 9 after it was pointed
- out that each of the POSIX standards should keep generally in
- its own namespace.
-
- It is possible for some (usually privileged) operations to
- remove utilities that support these options, or otherwise render
- these options unsupported. The header files, the _s_y_s_c_o_n_f()
- function, or the getconf utility will not necessarily detect
- such actions, in which case they should not be considered as
- rendering the implementation nonconforming. A test suite should
- not attempt tests like:
-
- rm /usr/bin/c89
- getconf POSIX2_C_DEV
-
- {_POSIX_LOCALEDEF}
- This symbol was introduced to allow implementations to restrict
- supported locales to only those supplied by the implementation.
-
- END_RATIONALE
-
-
-
-
-
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 2.13 Configuration Values 213
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- P1003.2/D11.2
-
-
-
-
-
-
-
-
- Section 3: Shell Command Language
-
-
-
- The shell is a command language interpreter. This section describes the
- syntax of that command language as it is used by the sh utility and the
- functions in 7.1 [such as _s_y_s_t_e_m() and _p_o_p_e_n() in the C binding].
-
- The shell operates according to the following general overview of
- operations. The specific details are included in the cited clauses and
- subclauses of this section. The shell:
-
- (1) Reads its input from a file (see sh in 4.56), from the -c
- option, or from one of the functions in 7.1. If the first line
- of a file of shell commands starts with the characters #!, the
- results are unspecified.
-
- (2) Breaks the input into tokens: words and operators. (See 3.3.)
-
- (3) Parses the input into simple (3.9.1) and compound (3.9.4)
- commands.
-
- (4) Performs various expansions (separately) on different parts of
- each command, resulting in a list of pathnames and fields to be
- treated as a command and arguments (3.6).
-
- (5) Performs redirection (3.7) and removes redirection operators and
- their operands from the parameter list.
-
- (6) Executes a function (3.9.5), built-in (3.14), executable file,
- or script, giving the name of the command (or, in the case of a 1
- function within a script, the name of the script) as the 1
- ``zero'th'' argument and the remaining words and fields as
- parameters (3.9.1.1).
-
- (7) Optionally waits for the command to complete and collects the
- exit status (3.8.2).
-
- BEGIN_RATIONALE
-
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 3 Shell Command Language 215
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- 3.0.1 Shell Command Language Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f
- _P_1_0_0_3._2)
-
- The System V shell was selected as the starting point for this standard.
- The BSD C-shell was excluded from consideration, for the following
- reasons:
-
- (1) Most historically portable shell scripts assume the Version 7
- ``Bourne'' shell, from which the System V shell is derived.
-
- (2) The majority of tutorial materials on shell programming assume
- the System V shell.
-
- Despite the selection of the System V shell, the developers of the
- standard did not limit the possibilities for a shell command language
- that was upward-compatible.
-
- The only programmatic interfaces to the shell language are through the
- functions in 7.1 and the sh utility. Most implementations provide an
- interface to, and processing mode for, the shell that is suitable for
- direct user interaction. The behavior of this interactive mode is not
- defined by this standard; however, places where historically an
- interactive shell behaves differently from the behavior described here
- are noted.
-
- (1) Aliases are not included in the base POSIX.2 because they
- duplicate functionality already available to applications with
- functions. In early drafts, the search order of simple command
- lookup was ``aliases, built-ins, functions, file system,'' and
- therefore an alias was necessary to create a user-defined
- command having the same name as a built-in. To retain this
- capability, the search order has changed to ``special built-ins,
- functions, built-ins, file system,'' and a built-in, called
- command, has been added, which disables the looking up of
- functions. Aliases are a part of the POSIX.2a UPE because they
- are widely used by human users, as differentiated from
- applications.
-
- (2) All references to job control and related commands have been
- omitted from the base POSIX.2. POSIX.2 describes the
- noninteractive operation of the shell; job control is outside
- the scope of this standard until the UPE revision is developed.
- Apparently it is not widely known that traditionally, even in a
- job control environment, the commands executed during the
- execution of a shell script are not placed into separate process
- groups. If they were, one could not stop the execution of the
- shell script from the interactive shell, for example. This
- standard does not require or prohibit job control; it simply
- does not mention it.
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 216 3 Shell Command Language
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- (3) The conditional command (double bracket [[ ]]) was removed from
- an earlier draft. Objections were lodged that the real problem
- is misuse of the test command ([), and putting it into the shell
- is the wrong way to fix the problem. Instead, proper
- documentation and a new shell reserved word (!) are sufficient.
- Tests that require multiple test operations can be done at the
- shell level using individual invocations of the test command and
- shell logicals, rather than the error prone -o flag of test.
-
- (4) Exportable functions were removed from an earlier draft. See
- the rationale in 3.9.5.1.
-
- The construct #! is reserved for implementations wishing to provide that
- extension. If it were not reserved, the standard would disallow it by
- forcing it to be a comment. As it stands, a conforming application shall
- not use #! as the first line of a shell script.
-
- END_RATIONALE
-
-
-
- 3.1 Shell Definitions
-
- The following terms are used in Section 3. Because they are specific to
- the shell, they do not appear in 2.2.2.
-
- 3.1.1 control operator: A token that performs a control function.
-
- It is one of the following symbols:
-
- & ) <newline>
- && ; |
- ( ;; ||
-
- The end-of-input indicator used internally by the shell is also
- considered a control operator. See 3.3.
-
- On some systems, the symbol (( is a control operator; its use produces 1
- unspecified results.
-
- 3.1.2 expand: When not qualified, the act of applying all the
- expansions described in 3.6.
-
- 3.1.3 field: A unit of text that is the result of parameter expansion
- (3.6.2), arithmetic expansion (3.6.4), command substitution (3.6.3), or
- field splitting (3.6.5).
-
- During command processing (see 3.9.1), the resulting fields are used as
- the command name and its arguments.
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 3.1 Shell Definitions 217
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- 3.1.4 interactive shell: A processing mode of the shell that is
- suitable for direct user interaction.
-
- The behavior in this mode is not defined by this standard.
-
- NOTE: The preceding sentence is expected to change following the
- eventual approval of the UPE supplement.
-
- 3.1.5 name: A word consisting solely of underscores, digits, and
- alphabetics from the portable character set (see 2.4).
-
- The first character of a name shall not be a digit.
-
- 3.1.6 operator: Either a control operator or a redirection operator.
-
- 3.1.7 parameter: An entity that stores values.
-
- There are three types of parameters: variables (named parameters),
- positional parameters, and special parameters. Parameter expansion is
- accomplished by introducing a parameter with the $ character. See 3.5.
-
- 3.1.8 positional parameter: A parameter denoted by a single digit or
- one or more digits in curly braces.
-
- See 3.5.1.
-
- 3.1.9 redirection: A method of associating files with the input/output
- of commands.
-
- See 3.7.
-
- 3.1.10 redirection operator: A token that performs a redirection
- function.
-
- It is one of the following symbols:
-
- < > >| << >> <& >& <<- <>
-
- 3.1.11 special parameter: A parameter named by a single character from
- the following list:
-
- * @ # ? ! - $ 0
-
- See 3.5.2.
-
- 3.1.12 subshell: A shell execution environment, distinguished from the
- main or current shell execution environment by the attributes described
- in 3.12.
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 218 3 Shell Command Language
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- 3.1.13 token: A sequence of characters that the shell considers as a
- single unit when reading input, according to the rules in 3.3.
-
- A token is either an operator or a word.
-
- 3.1.14 variable: A named parameter. See 3.5.
-
- 3.1.15 variable assignment [assignment]: A word consisting of the
- following parts
-
- _v_a_r_n_a_m_e=_v_a_l_u_e
-
- When used in a context where assignment is defined to occur (see 3.9.1)
- and at no other time, the _v_a_l_u_e (representing a word or field) shall be
- assigned as the value of the variable denoted by _v_a_r_n_a_m_e. The _v_a_r_n_a_m_e and
- _v_a_l_u_e parts meet the requirements for a name and a word, respectively,
- except that they are delimited by the embedded unquoted equals-sign in
- addition to the delimiting described in 3.3. In all cases, the variable
- shall be created if it did not already exist. If _v_a_l_u_e is not specified,
- the variable shall be given a null value.
-
- An alternative form of variable assignment:
-
- _s_y_m_b_o_l=_v_a_l_u_e
-
- (where _s_y_m_b_o_l is a valid word delimited by an equals-sign, but not a
- valid name) produces unspecified results.
-
- 3.1.16 word: A token other than an operator.
-
- In some cases a word is also a portion of a word token: in the various
- forms of parameter expansion (3.6.2), such as ${_n_a_m_e-_w_o_r_d}, and variable
- assignment, such as _n_a_m_e=_w_o_r_d, the word is the portion of the token
- depicted by _w_o_r_d. The concept of a word is no longer applicable following
- word expansions--only fields remain; see 3.6.
-
- BEGIN_RATIONALE
-
-
- 3.1.17 Shell Definitions Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f
- _P_1_0_0_3._2)
-
- The _w_o_r_d=_w_o_r_d form of variable assignment was included, producing
- unspecified results, to allow the KornShell _n_a_m_e[_e_x_p_r_e_s_s_i_o_n]=_v_a_l_u_e syntax
- to conform.
-
- The (( symbol is a control operator in the KornShell, used for an 1
- alternative syntax of an arithmetic expression command. A strictly
- conforming POSIX.2 application cannot use (( as a single token [with the
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 3.1 Shell Definitions 219
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- obvious exception of the $(( form described in POSIX.2]. The decision to
- require this is based solely on the pragmatic knowledge that there are
- many more historical shell scripts using the KornShell syntax than there
- might be using nested subshells, such as
-
- ((foo)) or ((foo);(bar))
-
- The latter example should not be misinterpreted by the shell as
- arithmetic because attempts to balance the parentheses pairs would
- indicate that they are subshells. Thus, in most cases, while a few
- scripts will no longer be strictly portable, the chances of breaking
- existing scripts is even smaller.
-
- There are no explicit limits in this standard on the sizes of names, 1
- words, lines, or other objects. However, other implicit limits do apply: 1
- shell script lines produced by many of the standard utilities cannot 1
- exceed {LINE_MAX} and the sum of exported variables comes under the 1
- {ARG_MAX} limit. Historical shells dynamically allocate memory for names 1
- and words and parse incoming lines a byte at a time. Lines cannot have 1
- an arbitrary {LINE_MAX} limit because of historical practice such as 1
- makefiles, where make removes the <newline>s associated with the commands 1
- for a target and presents the shell with one very long line. The text in 1
- 2.11.5.2 does allow a shell to run out of memory, but it cannot have
- arbitrary programming limits.
-
- END_RATIONALE
-
-
-
- 3.2 Quoting
-
- Quoting is used to remove the special meaning of certain characters or
- words to the shell. Quoting can be used to preserve the literal meaning
- of the special characters in the next paragraph; prevent reserved words
- from being recognized as such; and prevent parameter expansion and
- command substitution within here-document processing (see 3.7.4).
-
- The following characters shall be quoted if they are to represent
- themselves:
-
- | & ; < > ( ) $ ` \ " '
- <space> <tab> <newline>
-
- and the following may need to be quoted under certain circumstances.
- That is, these characters may be special depending on conditions
- described elsewhere in the standard:
-
- * ? [ # ~ = %
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 220 3 Shell Command Language
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- The various quoting mechanisms are the escape character, single-quotes,
- and double-quotes. The here-document represents another form of quoting;
- see 3.7.4.
-
-
- 3.2.1 Escape Character (Backslash)
-
- A backslash that is not quoted shall preserve the literal value of the
- following character, with the exception of a <newline>. If a <newline>
- follows the backslash, the shell shall interpret this as line
- continuation. The backslash and <newline> shall be removed before
- splitting the input into tokens.
-
-
- 3.2.2 Single-Quotes
-
- Enclosing characters in single-quotes (' ') shall preserve the literal
- value of each character within the single-quotes. A single-quote cannot
- occur within single-quotes.
-
-
- 3.2.3 Double-Quotes
-
- Enclosing characters in double-quotes (" ") shall preserve the literal
- value of all characters within the double-quotes, with the exception of
- the characters dollar-sign, backquote, and backslash, as follows:
-
- $ The dollar-sign shall retain its special meaning introducing
- parameter expansion (see 3.6.2), a form of command substitution
- (see 3.6.3), and arithmetic expansion (see 3.6.4).
-
- The input characters within the quoted string that are also
- enclosed between $( and the matching ) shall not be affected by
- the double-quotes, but rather shall define that command whose
- output replaces the $(...) when the word is expanded. The
- tokenizing rules in 3.3 shall be applied recursively to find the
- matching ).
-
- Within the string of characters from an enclosed ${ to the
- matching }, an even number of unescaped double-quotes or
- single-quotes, if any, shall occur. A preceding backslash
- character shall be used to escape a literal { or }. The rule in
- 3.6.2 shall be used to determine the matching }.
-
- ` The backquote shall retain its special meaning introducing the
- other form of command substitution (see 3.6.3). The portion of
- the quoted string from the initial backquote and the characters
- up to the next backquote that is not preceded by a backslash,
- having escape characters removed, defines that command whose
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 3.2 Quoting 221
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- output replaces `...` when the word is expanded. Either of the
- following cases produces undefined results:
-
- - A single- or double-quoted string that begins, but does not
- end, within the `...` sequence.
-
- - A `...` sequence that begins, but does not end, within the
- same double-quoted string.
-
- \ The backslash shall retain its special meaning as an escape
- character (see 3.2.1) only when followed by one of the
- characters:
-
- $ ` " \ <newline>
-
- A double-quote shall be preceded by a backslash to be included within
- double-quotes. The parameter @ has special meaning inside double-quotes
- and is described in 3.5.2.
-
- BEGIN_RATIONALE
-
-
- 3.2.4 Quotes Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2)
-
- A backslash cannot be used to escape a single-quote in a single-quoted
- string. An embedded quote can be created by writing, for example,
- 'a'\''b', which yields a'b. (See 3.6.5 for a better understanding of how
- portions of words are either split into fields or remain concatenated.)
- A single token can be made up of concatenated partial strings containing
- all three kinds of quoting/escaping, thus permitting any combination of
- characters.
-
- The escaped <newline> used for line continuation is removed entirely from
- the input and is not replaced by any white space. Therefore, it cannot
- serve as a token separator.
-
- In double-quoting, if a backslash is immediately followed by a character
- that would be interpreted as having a special meaning, the backslash is
- deleted and the subsequent character is taken literally. If a backslash
- does not precede a character that would have a special meaning, it is
- left in place unmodified and the character immediately following it is
- also left unmodified. Thus, for example:
-
- "\$" => $
-
- "\a" => \a
-
- It would be desirable to include the statement ``The characters from an
- enclosed ${ to the matching } shall not be affected by the double-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 222 3 Shell Command Language
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- quotes,'' similar to the one for $( ). However, historical practice in
- the System V shell prevents this. The requirement that double-quotes be
- matched inside ${...} within double-quotes and the rule for finding the
- matching } in 3.6.2 eliminate several subtle inconsistencies in expansion
- for historical shells in rare cases; for example,
-
- "${foo-bar"}
-
- yields bar when foo is not defined, and is an invalid substitution when 1
- foo is defined, in many historical shells. The differences in processing
- the "${...}" form have led to inconsistencies between the historical
- System V, BSD, and KornShells, and the text in POSIX.2 is an attempt to
- converge them without breaking many applications. A consequence of the
- new rule is that single-quotes cannot be used to quote the } within
- "${...}"; for example
-
- unset bar
- foo="${bar-'}'}"
-
- is invalid because the "${...}" substitution contains an unpaired 1
- unescaped single-quote. The backslash can be used to escape the } in 1
- this example to achieve the desired result:
-
- unset bar
- foo="${bar-\}}"
-
- The only alternative to this compromise between shells would be to make
- the behavior unspecified whenever the literal characters ', {, }, and "
- appear within ${...}. To write a portable script that uses these values,
- a user would have to assign variables, say,
-
- squote=\' dquote=\" lbrace='{' rbrace='}'
- ${foo-$squote$rbrace$squote}
-
- rather than
-
- ${foo-"'}'"}
-
- Some systems have allowed the end of the word to terminate the backquoted
- command substitution, such as in
-
- "`echo hello"
-
- This usage is undefined in POSIX.2, where the matching backquote is
- required. The other undefined usage can be illustrated by the example:
-
- sh -c '` echo "foo`'
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 3.2 Quoting 223
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- The description of the recursive actions involving command substitution
- can be illustrated with an example. Upon recognizing the introduction of
- command substitution, the shell must parse input (in a new context),
- gathering the ``source'' for the command substitution until an unbalanced
- ) or ` is located. For example, in the following
-
- echo "$(date; echo "
- one" )"
-
- the double-quote following the echo does not terminate the first double-
- quote; it is part of the command substitution ``script.'' Similarly, in
-
- echo "$(echo *)"
-
- the asterisk is not quoted since it is inside command substitution;
- however,
-
- echo "$(echo "*")"
-
- is quoted (and represents the asterisk character itself).
-
- END_RATIONALE
-
-
-
- 3.3 Token Recognition
-
- The shell reads its input in terms of lines from a file, from a terminal
- in the case of an interactive shell, or from a string in the case of
- sh -c or _s_y_s_t_e_m(). The input lines can be of unlimited length. These 1
- lines are parsed using two major modes: ordinary token recognition and 1
- processing of here-documents.
-
- When an io_here token has been recognized by the grammar (see 3.10), one
- or more of the immediately subsequent lines form the body of one or more
- here-documents and shall be parsed according to the rules of 3.7.4.
-
- When it is not processing an io_here, the shell shall break its input 1
- into tokens by applying the first applicable rule below to the next
- character in its input. The token shall be from the current position in
- the input until a token is delimited according to one of the rules below;
- the characters forming the token are exactly those in the input,
- including any quoting characters. If it is indicated that a token is
- delimited, and no characters have been included in a token, processing
- shall continue until an actual token is delimited.
-
- (1) If the end of input is recognized, the current token shall be
- delimited. If there is no current token, the end-of-input
- indicator shall be returned as the token.
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 224 3 Shell Command Language
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- (2) If the previous character was used as part of an operator and
- the current character is not quoted and can be used with the
- current characters to form an operator, it shall be used as part
- of that (operator) token.
-
- (3) If the previous character was used as part of an operator and
- the current character cannot be used with the current characters
- to form an operator, the operator containing the previous
- character shall be delimited.
-
- (4) If the current character is backslash, single-quote, or double-
- quote (\, ', or ") and it is not quoted, it shall affect quoting
- for subsequent character(s) up to the end of the quoted text.
- The rules for quoting are as described in 3.2. During token
- recognition no substitutions shall be actually performed, and
- the result token shall contain exactly the characters that
- appear in the input (except for <newline> joining), unmodified,
- including any embedded or enclosing quotes or substitution
- operators, between the quote mark and the end of the quoted
- text. The token shall not be delimited by the end of the quoted
- field.
-
- (5) If the current character is an unquoted $ or `, the shell shall
- identify the start of any candidates for parameter expansion
- (3.6.2), command substitution (3.6.3), or arithmetic expansion
- (3.6.4) from their introductory unquoted character sequences: $
- or ${, $( or `, and $((, respectively. The shell shall read
- sufficient input to determine the end of the unit to be expanded
- (as explained in the cited subclauses). While processing the
- characters, if instances of expansions or quoting are found
- nested within the substitution, the shell shall recursively
- process them in the manner specified for the construct that is
- found. The characters found from the beginning of the
- substitution to its end, allowing for any recursion necessary to
- recognize embedded constructs, shall be included unmodified in
- the result token, including any embedded or enclosing
- substitution operators or quotes. The token shall not be
- delimited by the end of the substitution.
-
- (6) If the current character is not quoted and can be used as the
- first character of a new operator, the current token (if any)
- shall be delimited. The current character shall be used as the
- beginning of the next (operator) token.
-
- (7) If the current character is an unquoted <newline>, the current
- token shall be delimited.
-
- (8) If the current character is an unquoted <blank>, any token
- containing the previous character is delimited and the current
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 3.3 Token Recognition 225
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- character is discarded.
-
- (9) If the previous character was part of a word, the current
- character is appended to that word.
-
- (10) If the current character is a #, it and all subsequent
- characters up to, but excluding, the next <newline> are
- discarded as a comment. The <newline> that ends the line is not
- considered part of the comment.
-
- (11) The current character is used as the start of a new word.
-
- Once a token is delimited, it shall be categorized as required by the
- grammar in 3.10.
-
- BEGIN_RATIONALE
-
-
- 3.3.1 Token Recognition Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f
- _P_1_0_0_3._2)
-
- The (3) rule about combining characters to form operators is not meant to 1
- preclude systems from extending the shell language when characters are 1
- combined in otherwise invalid ways. Portable applications cannot use 1
- invalid combinations and test suites should not penalize systems that 1
- take advantage of this fact. For example, the unquoted combination |& is 1
- not valid in a POSIX.2 script, but has a specific KornShell meaning. 1
-
- The (10) rule about # as the current character is the first in the
- sequence in which a new token is being assembled. The # starts a comment
- only when it is at the beginning of a token. This rule is also written
- to indicate that the search for the end-of-comment does not consider
- escaped <newline> specially, so that a comment cannot be continued to the
- next line.
-
- END_RATIONALE
-
-
-
- 3.4 Reserved Words
-
- Reserved words are words that have special meaning to the shell. (See
- 3.9.) The following words shall be recognized as reserved words:
-
- ! elif fi in while
- case else for then {4)
- do esac if until }
- done
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 226 3 Shell Command Language
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- This recognition shall occur only when none of the characters are quoted
- and when the word is used as:
-
- (1) The first word of a command
-
- (2) The first word following one of the reserved words other than
- case, for, or in
-
- (3) The third word in a case or for command (only in is valid in
- this case)
-
- See the grammar in 3.10.
-
- The following words may be recognized as reserved words on some systems
- (when none of the characters are quoted), causing unspecified results:
-
- function select [[ ]] 2
-
- Words that are the concatenation of a name and a colon (:) are reserved;
- their use produces unspecified results.
-
- BEGIN_RATIONALE
-
-
- 3.4.1 Reserved Words Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f
- _P_1_0_0_3._2)
-
- All reserved words are recognized syntactically as such in the contexts
- described. However, it is useful to point out that in is the only
- meaningful reserved word after a case or for; similarly, in is not
- meaningful as the first word of a simple command.
-
- Reserved words are recognized only when they are delimited (i.e., meet
- the definition of _w_o_r_d; see 3.1.16), whereas operators are themselves
- delimiters. For instance, ( and ) are control operators, so that no
- <space> is needed in (list). However, { and } are reserved words in
- { list;}, so that in this case the leading <space> and semicolon are
- required.
-
-
-
- __________
- 4) In some historical systems, the curly braces are treated as control
- operators. To assist in future standardization activities, portable
- applications should avoid using unquoted braces to represent the
- characters themselves. It is possible that a future version of
- POSIX.2 may require this, although probably not for the often-used
- find {} construct.
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 3.4 Reserved Words 227
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- The list of unspecified reserved words is from the KornShell, so portable
- applications cannot use them in places a reserved word would be
- recognized. This list contained time in earlier drafts, but it was 2
- removed when the time utility was selected for the UPE. 2
-
- There was a strong argument for promoting braces to operators (instead of
- reserved words), so they would be syntactically equivalent to subshell
- operators. Concerns about compatibility outweighed the advantages of
- this approach. Nevertheless, portable applications should consider
- quoting { and } when they represent themselves.
-
- The restriction on ending a name with a colon is to allow future
- implementations that support named labels for flow control. See the
- rationale for break (3.14.1.1).
-
- END_RATIONALE
-
-
-
- 3.5 Parameters and Variables
-
- A parameter can be denoted by a name, a number, or one of the special
- characters listed in 3.5.2. A variable is a parameter denoted by a name.
-
- A parameter is set if it has an assigned value (null is a valid value).
- Once a variable is set, it can only be unset by using the unset special
- built-in command.
-
-
- 3.5.1 Positional Parameters
-
- A positional parameter is a parameter denoted by the decimal value
- represented by one or more digits, other than the single digit 0. When a
- positional parameter with more than one digit is specified, the
- application shall enclose the digits in braces (see 3.6.2). Positional
- parameters are initially assigned when the shell is invoked (see sh in
- 4.56), temporarily replaced when a shell function is invoked (see 3.9.5),
- and can be reassigned with the set special built-in command.
-
- BEGIN_RATIONALE
-
- 3.5.1.1 Positional Parameters Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t
- _o_f _P_1_0_0_3._2)
-
- The digits denoting the positional parameters are always interpreted as a
- decimal value, even if there is a leading zero.
-
- END_RATIONALE
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 228 3 Shell Command Language
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- 3.5.2 Special Parameters
-
- Listed below are the special parameters and the values to which they
- shall expand. Only the values of the special parameters are listed; see
- 3.6 for a detailed summary of all the stages involved in expanding words.
-
- * Expands to the positional parameters, starting from one. When
- the expansion occurs within a double-quoted string (see 3.2.3),
- it expands to a single field with the value of each parameter
- separated by the first character of the IFS variable, or by a
- <space> if IFS is unset.
-
- @ Expands to the positional parameters, starting from one. When
- the expansion occurs within double-quotes, each positional
- parameter expands as a separate field, with the provision that
- the expansion of the first parameter is still joined with the
- beginning part of the original word (assuming that the expanded
- parameter was embedded within a word), and the expansion of the
- last parameter is still joined with the last part of the
- original word. If there are no positional parameters, the 1
- expansion of @ shall generate zero fields, even when @ is 1
- double-quoted. 1
-
- # Expands to the decimal number of positional parameters.
-
- ? Expands to the decimal exit status of the most recent pipeline
- (see 3.9.2).
-
- - (Hyphen) Expands to the current option flags (the single-letter
- option names concatenated into a string) as specified on
- invocation, by the set special built-in command, or implicitly
- by the shell.
-
- $ Expands to the decimal process ID of the invoked shell. In a
- subshell (see 3.12), $ shall expand to the same value as that of
- the current shell.
-
- ! Expands to the decimal process ID of the most recent background
- command (see 3.9.3) executed from the current shell. For a 1
- pipeline, the process ID is that of the last command in the
- pipeline.
-
- 0 (Zero.) Expands to the name of the shell or shell script. See
- sh (4.56) for a detailed description of how this name is
- derived.
-
- See the description of the IFS variable in 3.5.3.
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 3.5 Parameters and Variables 229
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- BEGIN_RATIONALE
-
-
- 3.5.2.1 Special Parameters Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f
- _P_1_0_0_3._2)
-
- Most historical implementations implement subshells by forking; thus, the
- special parameter $ does not necessarily represent the process ID of the
- shell process executing the commands since the subshell execution
- environment preserves the value of $.
-
- If a subshell were to execute a background command, the value of its 1
- parent's $! would not change. For example: 1
-
- ( 1
- date & 1
- echo $! 1
- ) 1
- echo $! 1
-
- would echo two different values for $!. 1
-
- The descriptions of parameters * and @ assume the reader is familiar with
- the field splitting discussion in 3.6.5 and understands that portions of
- the word will remain concatenated unless there is some reason to split
- them into separate fields. Some examples of the * and @ properties,
- including the concatenation aspects:
-
- set "abc" "def ghi" "jkl"
-
- echo $* => "abc" "def" "ghi" "jkl"
- echo "$*" => "abc def ghi jkl"
- echo $@ => "abc" "def" "ghi" "jkl"
-
- _b_u_t
-
- echo "$@" => "abc" "def ghi" "jkl"
- echo "xx$@yy" => "xxabc" "def ghi" "jklyy"
- echo "$@$@" => "abc" "def ghi" "jklabc" "def ghi" "jkl"
-
- In the preceding examples, the double-quote characters that appear after
- the => do not appear in the output and are used only to illustrate word
- boundaries.
-
- Historical versions of the Bourne shell have used <space> as a separator
- between the expanded members of "$*". The KornShell has used the first
- character in IFS, which is <space> by default. If IFS is set to a null 1
- string, this is not equivalent to unsetting it; its first character will 1
- not exist, so the parameter values are concatenated. For example: 1
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 230 3 Shell Command Language
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- $ IFS='' 1
- $ set foo bar bam 1
- $ echo "$@" 1
- foo bar bam 1
- $ echo "$*" 1
- foobarbam 1
- $ unset IFS 1
- $ echo "$*" 1
- foo bar bam 1
-
- The $- can be used to save and restore set options:
-
- Save=$(echo $- | sed 's/[ics]//g') 1
- ...
- set +aCefnuvx 2
- set -$Save
-
- The three options are removed using sed in the example because they may 1
- appear in the value of $- (from the sh command line), but are not valid 1
- options to set. 1
-
- The command name (parameter 0) is not counted in the number given by #
- because it is a special parameter, not a positional parameter.
-
- END_RATIONALE
-
-
- 3.5.3 Variables
-
- Variables shall be initialized from the environment (as defined by
- POSIX.1 {8}) and can be given new values with variable assignment
- commands. If a variable is initialized from the environment, it shall be
- marked for export immediately; see 3.14.8. New variables can be defined
- and initialized with variable assignments, with the read or getopts
- utilities, with the _n_a_m_e parameter in a for loop (see 3.9.4.2), with the
- ${_n_a_m_e=_w_o_r_d} expansion, or with other mechanisms provided as
- implementation extensions. The following variables shall affect the
- execution of the shell:
-
- HOME This variable shall be interpreted as the pathname
- of the user's home directory. The contents of HOME
- are used in Tilde Expansion (see 3.6.1).
-
- IFS _I_n_p_u_t _f_i_e_l_d _s_e_p_a_r_a_t_o_r_s: a string treated as a list
- of characters that is used for field splitting and
- to split lines into fields with the read command.
- If IFS is not set, the shell shall behave as if the
- value of IFS were the <space>, <tab>, and <newline>
- characters. (See 3.6.5.)
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 3.5 Parameters and Variables 231
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- LANG This variable shall provide a default value for the
- LC_* variables, as described in 2.6.
-
- LC_ALL This variable shall interact with the LANG and LC_*
- variables as described in 2.6.
-
- LC_COLLATE This variable shall determine the behavior of range
- expressions, equivalence classes, and
- multicharacter collating elements within pattern
- matching.
-
- LC_CTYPE This variable shall determine the interpretation of
- sequences of bytes of text data as characters
- (e.g., single- versus multibyte characters), which
- characters are defined as letters (character class
- alpha), and the behavior of character classes
- within pattern matching.
-
- LC_MESSAGES This variable shall determine the language in which
- messages should be written.
-
- PATH This variable represents a string formatted as
- described in 2.6, used to effect command
- interpretation. See 3.9.1.1. 1
-
- BEGIN_RATIONALE
-
-
- 3.5.3.1 Variables Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2)
-
- A description of PWD (which is automatically set by the KornShell
- whenever the current working directory changes) was omitted because its
- functionality is easily reproduced using $(pwd).
-
- See the discussion of IFS in 3.6.5.1.
-
- Other common environment variables used by historical shells are not
- specified by this standard, but they should be reserved for the
- historical uses. For interactive use, other shell variables are expected
- to be introduced by the UPE (and this rationale will be updated
- accordingly): ENV, FCEDIT, HISTFILE, HISTSIZE, LINENO, PPID, PS1, PS2,
- PS4.
-
- Tilde expansion for components of the PATH in an assignment such as:
-
- PATH=~hlj/bin:~dwc/bin:$PATH 1
-
- is a feature of some historical shells and is allowed by the wording of 1
- 3.6.1. Note that the tildes are expanded during the assignment to PATH, 1
- not when PATH is accessed during command search. 1
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 232 3 Shell Command Language
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- END_RATIONALE 1
-
-
-
- 3.6 Word Expansions
-
- This clause describes the various expansions that are performed on words.
- Not all expansions are performed on every word, as explained in the
- following subclauses.
-
- Tilde expansions, parameter expansions, command substitutions, arithmetic
- expansions, and quote removals that occur within a single word expand to
- a single field. It is only field splitting or pathname expansion that
- can create multiple fields from a single word. The single exception to
- this rule is the expansion of the special parameter @ within double-
- quotes, as is described in 3.5.2.
-
- The order of word expansion shall be as follows:
-
- (1) Tilde Expansion (see 3.6.1), Parameter Expansion (see 3.6.2), 1
- Command Substitution (see 3.6.3), and Arithmetic Expansion (see
- 3.6.4) shall be performed, beginning to end. [See item (5) in
- 3.3.]
-
- (2) Field Splitting (see 3.6.5) shall be performed on fields
- generated by step (1) unless IFS is null.
-
- (3) Pathname Expansion (see 3.6.6) shall be performed, unless set -f
- is in effect.
-
- (4) Quote Removal (see 3.6.7) shall always be performed last.
-
- The expansions described in this clause shall occur in the same shell
- environment as that in which the command is executed.
-
- If the complete expansion appropriate for a word results in an empty
- field, that empty field shall be deleted from the list of fields that
- form the completely expanded command, unless the original word contained 1
- single-quote or double-quote characters. 1
-
- The $ character is used to introduce parameter expansion, command
- substitution, or arithmetic evaluation. If an unquoted $ is followed by
- a character that is either not numeric, the name of one of the special
- parameters (see 3.5.2), a valid first character of a variable name, a
- left curly brace ({), or a left parenthesis, the result is unspecified.
-
- BEGIN_RATIONALE
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 3.6 Word Expansions 233
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- 3.6.0.1 Word Expansions Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f
- _P_1_0_0_3._2)
-
- IFS is used for performing field splitting on the results of parameter
- and command substitution; it is not used for splitting all fields.
- Previous versions of the shell used it for splitting all fields during
- field splitting, but this has severe problems because the shell can no
- longer parse its own script. There are also important security
- implications caused by this behavior. All useful applications of IFS use
- it for parsing input of the read utility and for splitting the results of
- parameter and command substitution. New versions of the shell have fixed
- this bug, and POSIX.2 requires the corrected behavior.
-
- The rule concerning expansion to a single field requires that if foo=abc
- and bar=def, that
-
- "$foo""$bar"
-
- expands to the single field
-
- abcdef
-
- The rule concerning empty fields can be illustrated by:
-
- $ unset foo
- $ set $foo bar '' xyz "$foo" abc
- $ for i
- > do
- > echo "-$i-"
- > done
- -bar-
- --
- -xyz-
- --
- -abc-
-
- Step (1) indicates that Tilde Expansion, Parameter Expansion, Command 1
- Substitution, and Arithmetic Expansion are all processed simultaneously
- as they are scanned. For example, the following is valid arithmetic:
-
- x=1
- echo $(( $(echo 3)+$x ))
-
- An earlier draft stated that Tilde Expansion preceded the other steps, 1
- but this is not the case in known historical implementations; if it were, 1
- and a referenced home directory contained a $ character, expansions would 1
- result within the directory name. 1
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 234 3 Shell Command Language
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- END_RATIONALE 1
-
-
- 3.6.1 Tilde Expansion
-
- A _t_i_l_d_e-_p_r_e_f_i_x consists of an unquoted tilde character at the beginning
- of a word, followed by all of the characters preceding the first unquoted 2
- slash in the word, or all the characters in the word if there is no 2
- slash. In an assignment (see 3.1.15), multiple tilde prefixes can be 2
- used: at the beginning of the word (i.e., following the equals-sign of 2
- the assignment) and/or following any unquoted colon. A tilde prefix in 2
- an assignment is terminated by the first unquoted colon or slash. If 2
- none of the characters in the tilde-prefix are quoted, the characters in 1
- the tilde-prefix following the tilde shall be treated as a possible login 1
- name from the user database (see POSIX.1 {8} Section 9). A portable 2
- login name cannot contain characters outside the set given in the 2
- description of the LOGNAME environment variable in POSIX.1 {8}. If the 2
- login name is null (i.e., the tilde-prefix contains only the tilde), the
- tilde-prefix shall be replaced by the value of the variable HOME. If
- HOME is unset, the results are unspecified. Otherwise, the tilde-prefix
- shall be replaced by a pathname of the home directory associated with the
- login name obtained using the equivalent of the POSIX.1 {8} _g_e_t_p_w_n_a_m() 1
- function. If the system does not recognize the login name, the results 1
- are undefined.
-
- BEGIN_RATIONALE
-
-
- 3.6.1.1 Tilde Expansion Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f
- _P_1_0_0_3._2)
-
- 2
-
- The text about quoting of the word indicates that \~hlj/, ~h\lj/, 2
- ~"hlj"/, ~hlj\/, and ~hlj/ are not equivalent: only the last will cause 2
- tilde expansion. 2
-
- Tilde expansion generally occurs only at the beginning of words, but 2
- POSIX.2 has adopted an exception based on historical practice in the 2
- KornShell: 2
-
- PATH=/posix/bin:~dgk/bin 2
-
- is eligible for tilde expansion because tilde follows a colon and none of 2
- the relevant characters is quoted. Consideration was given to 2
- prohibiting this behavior because any of the following are reasonable 2
- substitutes: 2
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 3.6 Word Expansions 235
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- PATH=$(printf %s: rms/bin bfox/bin ...) 2
- PATH=$(printf %s ~karels/bi~n : bostic/bin) 2
- for Dir in maart~/bin srb/bin ~... 2
- do ~ ~ 2
- PATH=${PATH:+$PATH:}$Dir 2
- done 2
-
- (In the first command, any number of directory names are concatenated and 2
- separated with colons, but it may be undesirable to end the variable with 2
- a colon because this is an obsolescent means to include dot at the end of 2
- the PATH. In the second, explicit colons are used for each directory. 2
- In all cases, the shell performs tilde expansion on each directory 2
- because all are separate words to the shell.) 2
-
- The exception was included to avoid breaking numerous KornShell scripts 2
- and interactive users and despite the fact that variable assignments in 2
- scripts derived from other systems will have to use quoting in some cases 2
- to allow literal tildes in strings. (This latter problem should be 2
- relatively rare because only tildes preceding known login names in 2
- unquoted strings are affected.) 2
-
- Note that expressions in operands such as 2
-
- make -k mumble LIBDIR= chet/lib 2
- ~
- do not qualify as shell variable assignments and tilde expansion is not 2
- performed (unless the command does so itself, which make does not). 2
-
- In an earlier draft, tilde expansion occurred following any unquoted 2
- equals-sign or colon, but this was removed because of its complexity and 2
- to avoid breaking commands such as: 2
-
- rcp hostname: marc/.profile . 2
- ~
- A suggestion was made that the special sequence ``$ '' should be allowed 2
- to force tilde expansion anywhere. Since this is n~ot historical 2
- practice, it has been left for future implementations to evaluate. (The 2
- description in 3.2 requires that a dollar-sign be quoted to represent 2
- itself, so the $ combination is already unspecified.) 2
- ~
- The results of giving tilde with an unknown login name are undefined
- because the KornShell + and - constructs make use of this condition,
- but in general it is a~n error~to give an incorrect login name with tilde.
- The results of having HOME unset are unspecified because some historical
- shells treat this as an error.
-
- END_RATIONALE
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 236 3 Shell Command Language
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- 3.6.2 Parameter Expansion
-
- The format for parameter expansion is as follows:
-
- ${_e_x_p_r_e_s_s_i_o_n}
-
- where _e_x_p_r_e_s_s_i_o_n consists of all characters until the matching }. Any } 2
- escaped by a backslash or within a quoted string, and characters in 2
- embedded arithmetic expansions, command substitutions, and variable 2
- expansions, shall not be examined in determining the matching }.
-
- The simplest form for parameter expansion is:
-
- ${_p_a_r_a_m_e_t_e_r}
-
- The value, if any, of _p_a_r_a_m_e_t_e_r shall be substituted.
-
- The parameter name or symbol can be enclosed in braces, which are
- optional except for positional parameters with more than one digit or
- when _p_a_r_a_m_e_t_e_r is followed by a character that could be interpreted as
- part of the name. The matching closing brace shall be determined by
- counting brace levels, skipping over enclosed quoted strings and command
- substitutions.
-
- If the parameter name or symbol is not enclosed in braces, the expansion
- shall use the longest valid name (see 3.1.5), whether or not the symbol
- represented by that name exists. If a parameter expansion occurs inside
- double-quotes:
-
- - Pathname expansion shall not be performed on the results of the
- expansion.
-
- - Field splitting shall not be performed on the results of the
- expansion, with the exception of @; see 3.5.2.
-
- In addition, a parameter expansion can be modified by using one of the
- following formats. In each case that a value of _w_o_r_d is needed (based on
- the state of _p_a_r_a_m_e_t_e_r, as described below), _w_o_r_d shall be subjected to
- tilde expansion, parameter expansion, command substitution, and
- arithmetic expansion. If _w_o_r_d is not needed, it shall not be expanded.
- The } character that delimits the following parameter expansion 1
- modifications shall be determined as described previously in this 1
- subclause and in 3.2.3. (For example, ${foo-bar}xyz} would result in the 1
- expansion of foo followed by the string xyz} if foo is set, else the
- string barxyz}).
-
-
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 3.6 Word Expansions 237
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- ${_p_a_r_a_m_e_t_e_r:-_w_o_r_d} Use Default Values. If _p_a_r_a_m_e_t_e_r is unset or
- null, the expansion of _w_o_r_d shall be
- substituted; otherwise, the value of
- _p_a_r_a_m_e_t_e_r shall be substituted.
-
- ${_p_a_r_a_m_e_t_e_r:=_w_o_r_d} Assign Default Values. If _p_a_r_a_m_e_t_e_r is unset
- or null, the expansion of _w_o_r_d shall be
- assigned to _p_a_r_a_m_e_t_e_r. In all cases, the
- final value of _p_a_r_a_m_e_t_e_r shall be
- substituted. Only variables, not positional
- parameters or special parameters, can be
- assigned in this way.
-
- ${_p_a_r_a_m_e_t_e_r:?[_w_o_r_d]} Indicate Error if Null or Unset. If
- _p_a_r_a_m_e_t_e_r is unset or null, the expansion of
- _w_o_r_d (or a message indicating it is unset if
- _w_o_r_d is omitted) shall be written to standard
- error and the shell shall exit with a nonzero
- exit status. Otherwise, the value of
- _p_a_r_a_m_e_t_e_r shall be substituted. An
- interactive shell need not exit.
-
- ${_p_a_r_a_m_e_t_e_r:+_w_o_r_d} Use Alternate Value. If _p_a_r_a_m_e_t_e_r is unset
- or null, null shall be substituted;
- otherwise, the expansion of _w_o_r_d shall be
- substituted.
-
- In the parameter expansions shown previously, use of the colon in the
- format results in a test for a parameter that is unset or null; omission
- of the colon results in a test for a parameter that is only unset.
-
- ${#_p_a_r_a_m_e_t_e_r} String Length. The length in characters of
- the value of _p_a_r_a_m_e_t_e_r. If _p_a_r_a_m_e_t_e_r is * or
- @, the result of the expansion is
- unspecified.
-
- The following four varieties of parameter expansion provide for substring
- processing. In each case, pattern matching notation (see 3.13), rather
- than regular expression notation, shall be used to evaluate the patterns.
- If _p_a_r_a_m_e_t_e_r is * or @, the result of the expansion is unspecified.
- Enclosing the full parameter expansion string in double-quotes shall not 1
- cause the following four varieties of pattern characters to be quoted, 1
- whereas quoting characters within the braces shall have this effect.
-
- ${_p_a_r_a_m_e_t_e_r%_w_o_r_d} Remove Smallest Suffix Pattern. The _w_o_r_d
- shall be expanded to produce a pattern. The
- parameter expansion then shall result in
- _p_a_r_a_m_e_t_e_r, with the smallest portion of the
- suffix matched by the _p_a_t_t_e_r_n deleted.
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 238 3 Shell Command Language
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- ${_p_a_r_a_m_e_t_e_r%%_w_o_r_d} Remove Largest Suffix Pattern. The _w_o_r_d
- shall be expanded to produce a pattern. The
- parameter expansion then shall result in
- _p_a_r_a_m_e_t_e_r, with the largest portion of the
- suffix matched by the _p_a_t_t_e_r_n deleted.
-
- ${_p_a_r_a_m_e_t_e_r#_w_o_r_d} Remove Smallest Prefix Pattern. The _w_o_r_d
- shall be expanded to produce a pattern. The
- parameter expansion then shall result in
- _p_a_r_a_m_e_t_e_r, with the smallest portion of the
- prefix matched by the _p_a_t_t_e_r_n deleted.
-
- ${_p_a_r_a_m_e_t_e_r##_w_o_r_d} Remove Largest Prefix Pattern. The _w_o_r_d
- shall be expanded to produce a pattern. The
- parameter expansion then shall result in
- _p_a_r_a_m_e_t_e_r, with the largest portion of the
- prefix matched by the _p_a_t_t_e_r_n deleted.
-
- BEGIN_RATIONALE
-
-
- 3.6.2.1 Parameter Expansion Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f
- _P_1_0_0_3._2)
-
- When the shell is scanning its input to determine the boundaries of a
- name, it is not bound by its knowledge of what names are already defined.
- For example, if F is a defined shell variable, the command "echo $Fred"
- does not echo the value of $F followed by red; it selects the longest
- possible valid name, Fred, which in this case might be unset.
-
- The rule for finding the closing } in ${...} is the one used in the
- KornShell and is upward compatible with the Bourne shell, which does not
- determine the closing } until the word is expanded. The advantage of
- this is that incomplete expansions, such as
-
- ${foo
-
- can be determined during tokenization, rather than during expansion.
-
- The four expansions with the optional colon have been hard to understand
- from the historical documentation. The following table summarizes the
- effect of the colon:
-
-
-
-
-
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 3.6 Word Expansions 239
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- _pppp_aaaa_rrrr_aaaa_mmmm_eeee_tttt_eeee_rrrr _pppp_aaaa_rrrr_aaaa_mmmm_eeee_tttt_eeee_rrrr _pppp_aaaa_rrrr_aaaa_mmmm_eeee_tttt_eeee_rrrr
- set and not null set but null unset
- ________________ ____________ __________
- ${_p_a_r_a_m_e_t_e_r:-_w_o_r_d} substitute substitute substitute
- _p_a_r_a_m_e_t_e_r _w_o_r_d _w_o_r_d
-
- ${_p_a_r_a_m_e_t_e_r-_w_o_r_d} substitute substitute substitute
- _p_a_r_a_m_e_t_e_r null _w_o_r_d
-
- ${_p_a_r_a_m_e_t_e_r:=_w_o_r_d} substitute assign assign
- _p_a_r_a_m_e_t_e_r _w_o_r_d _w_o_r_d
-
- ${_p_a_r_a_m_e_t_e_r=_w_o_r_d} substitute substitute assign
- _p_a_r_a_m_e_t_e_r _p_a_r_a_m_e_t_e_r _w_o_r_d
-
- ${_p_a_r_a_m_e_t_e_r:?_w_o_r_d} substitute error, error,
- _p_a_r_a_m_e_t_e_r exit exit
-
- ${_p_a_r_a_m_e_t_e_r?_w_o_r_d} substitute substitute error,
- _p_a_r_a_m_e_t_e_r null exit
-
- ${_p_a_r_a_m_e_t_e_r:+_w_o_r_d} substitute substitute substitute
- _w_o_r_d null null 1
-
- ${_p_a_r_a_m_e_t_e_r+_w_o_r_d} substitute substitute substitute
- _w_o_r_d _w_o_r_d null 1
-
-
- In all cases shown with ``substitute,'' the expression is replaced with
- the value shown. In all cases shown with ``assign,'' _p_a_r_a_m_e_t_e_r is
- assigned that value, which also replaces the expression.
-
- The string length and substring capabilities were included because of the
- demonstrated need for them, based on their usage in other shells, such as
- C-shell and KornShell.
-
- Historical versions of the KornShell have not performed tilde expansion
- on the word part of parameter expansion; however, it is more consistent
- to do so.
-
-
-
-
-
-
-
-
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 240 3 Shell Command Language
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- _E_x_a_m_p_l_e_s
-
- ${_p_a_r_a_m_e_t_e_r:-_w_o_r_d}
-
- In this example, ls is executed only if x is null or
- unset. [The $(ls) command substitution notation is
- explained in 3.6.3.]
-
- ${x:-$(ls)}
-
- ${_p_a_r_a_m_e_t_e_r:=_w_o_r_d}
-
- unset X
- echo ${X:=abc}
- abc
-
- ${_p_a_r_a_m_e_t_e_r:?_w_o_r_d}
-
- unset posix
- echo ${posix:?}
- sh: posix: parameter null or not set
-
- ${_p_a_r_a_m_e_t_e_r:+_w_o_r_d}
-
- set a b c
- echo ${3:+posix}
- posix
-
- ${#_p_a_r_a_m_e_t_e_r}
-
- HOME=/usr/posix
- echo ${#HOME}
- 10
-
- ${_p_a_r_a_m_e_t_e_r%_w_o_r_d}
-
- x=file.c
- echo ${x%.c}.o
- file.o
-
- ${_p_a_r_a_m_e_t_e_r%%_w_o_r_d}
-
- x=posix/src/std
- echo ${x%%/*}
- posix
-
-
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 3.6 Word Expansions 241
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- ${_p_a_r_a_m_e_t_e_r#_w_o_r_d}
-
- x=$HOME/src/cmd
- echo ${x#$HOME}
- /src/cmd
-
- ${_p_a_r_a_m_e_t_e_r##_w_o_r_d}
-
- x=/one/two/three
- echo ${x##*/}
- three
-
- The double-quoting of patterns is different depending on where the
- double-quotes are placed:
-
- "${x#*}" The asterisk is a pattern character.
-
- ${x#"*"} The literal asterisk is quoted and not special.
-
- END_RATIONALE
-
-
- 3.6.3 Command Substitution
-
- Command substitution allows the output of a command to be substituted in
- place of the command name itself. Command substitution shall occur when
- the command is enclosed as follows:
-
- $(_c_o_m_m_a_n_d)
-
- or (``backquoted'' version):
-
- `_c_o_m_m_a_n_d`
-
- The shell shall expand the command substitution by executing _c_o_m_m_a_n_d in a
- subshell environment (see 3.12) and replacing the command substitution
- [the text of _c_o_m_m_a_n_d plus the enclosing $( ) or backquotes] with the
- standard output of the command, removing sequences of one or more
- <newline>s at the end of the substitution. (Embedded <newline>s before
- the end of the output shall not be removed; however, during field
- splitting, they may be translated into <space>s, depending on the value
- of IFS and quoting that is in effect.)
-
- Within the backquoted style of command substitution, backslash shall
- retain its literal meaning, except when followed by
-
- $ ` \
-
- (dollar-sign, backquote, backslash). The search for the matching 2
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 242 3 Shell Command Language
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- backquote shall be satisfied by the first backquote found without a 2
- preceding backslash; during this search, if a nonescaped backquote is 2
- encountered within a shell comment, a here-document, an embedded command 2
- substitution of the $(_c_o_m_m_a_n_d) form, or a quoted string, undefined 2
- results occur. A single- or double-quoted string that begins, but does
- not end, within the `...` sequence produces undefined results.
-
- With the $(_c_o_m_m_a_n_d) form, all characters following the open parenthesis
- to the matching closing parenthesis constitute the _c_o_m_m_a_n_d. Any valid 2
- shell script can be used for _c_o_m_m_a_n_d, except: 2
-
- - A script consisting solely of redirections produces unspecified 2
- results. 2
-
- - See the restriction on single subshells described below. 2
-
- The results of command substitution shall not be processed for further 1
- tilde expansion, parameter expansion, command substitution, or arithmetic 1
- expansion. If a command substitution occurs inside double-quotes, field
- splitting and pathname expansion shall not be performed on the results of
- the substitution.
-
- Command substitution can be nested. To specify nesting within the
- backquoted version, the application shall precede the inner backquotes
- with backslashes; for example,
-
- \`_c_o_m_m_a_n_d\`
-
- If the command substitution consists of a single subshell, such as
-
- $( (_c_o_m_m_a_n_d) )
-
- a conforming application shall separate the $( and ( into two tokens
- (i.e., separate them with white space).
-
- BEGIN_RATIONALE
-
-
- 3.6.3.1 Command Substitution Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f
- _P_1_0_0_3._2)
-
- The new $( ) form of command substitution was adopted from the KornShell
- to solve a problem of inconsistent behavior when using backquotes. For
- example:
-
- _____C_o_m_m_a_n_d_______ O_u_t_p_u_t_
- echo '\$x' \$x
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 3.6 Word Expansions 243
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- echo `echo '\$x'` $x
- echo $(echo '\$x') \$x
-
- Additionally, the backquoted syntax has historical restrictions on the 2
- contents of the embedded command. While the new $( ) form can process 2
- any kind of valid embedded script, the backquoted cannot handle some 2
- valid scripts that include backquotes. For example, these otherwise 2
- valid embedded scripts do not work in the left column, but do work on the 2
- right: 2
-
- echo ` echo $( 2
- cat <<\eof cat <<\eof 2
- a here-doc with ` a here-doc with ) 2
- eof eof 2
- ` ) 2
-
- echo ` echo $( 2
- echo abc # a comment with ` echo abc # a comment with ) 2
- ` ) 2
-
- echo ` echo $( 2
- echo '`' echo ')' 2
- ` ) 2
-
- Some historical KornShell implementations did not process the first two 2
- examples correctly, but the author has agreed to make the appropriate 2
- modifications to do so. The KornShell will also be modified so that the 2
- following works: 2
-
- echo $( 2
- case word in 2
- [Ff]oo) echo found foo ;; 2
- esac 2
- ) 2
-
- Because of these inconsistent behaviors, the backquoted variety of
- command substitution is not recommended for new applications that nest
- command substitutions or attempt to embed complex scripts. Because of 2
- its widespread historical use, particularly by interactive users,
- however, the backquotes were retained in POSIX.2 without being declared
- obsolescent.
-
- The KornShell feature:
-
- If _c_o_m_m_a_n_d is of the form <_w_o_r_d, _w_o_r_d is expanded to generate a
- pathname, and the value of the command substitution is the contents
- of this file with any trailing <newline>_s deleted.
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 244 3 Shell Command Language
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- was omitted from this standard because $(cat word) is an appropriate
- substitute. However, to prevent breaking numerous scripts relying on 2
- this feature, it is unspecified to have a script within $( ) that has 2
- only redirections. 2
-
- The requirement to separate $( and ( when a single subshell is command-
- substituted is to avoid any ambiguities with Arithmetic Expansion. See
- 3.6.4.1.
-
- END_RATIONALE
-
-
- 3.6.4 Arithmetic Expansion
-
- Arithmetic expansion provides a mechanism for evaluating an arithmetic
- expression and substituting its value. The format for arithmetic
- expansion shall be as follows:
-
- $((_e_x_p_r_e_s_s_i_o_n))
-
- The expression shall be treated as if it were in double-quotes, except
- that a double-quote inside the expression is not treated specially. The
- shell shall expand all tokens in the expression for parameter expansion,
- command substitution, and quote removal.
-
- Next, the shell shall treat this as an arithmetic expression and
- substitute the value of the expression. The arithmetic expression shall
- be processed according to the rules given in 2.9.2.1, with the following
- exceptions:
-
- (1) Only integer arithmetic is required.
-
- (2) The sizeof() operator and the prefix and postfix ++ and --
- operators are not required.
-
- (3) Selection, Iteration, and Jump Statements are not supported.
-
- As an extension, the shell may recognize arithmetic expressions beyond
- those listed. If the expression is invalid, the expansion fails and the
- shell shall write a message to standard error indicating the failure.
-
- BEGIN_RATIONALE
-
-
-
-
-
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 3.6 Word Expansions 245
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- 3.6.4.1 Arithmetic Expansion Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f
- _P_1_0_0_3._2)
-
- Numerous ballots were received objecting to the inclusion of the (( ))
- form of KornShell arithmetic in previous drafts. The developers of the
- standard concluded that there is a strong desire for some kind of
- arithmetic evaluator to replace expr, and that tying it in with $ makes
- it fit in nicely with the standard shell language, and provides access to
- arithmetic evaluation in places where accessing a utility would be
- inconvenient or clumsy.
-
- Following long debate by interested members of the balloting group, the
- syntax and semantics for arithmetic were changed. The language is
- essentially a pure arithmetic evaluator of constants and operators
- (excluding assignment) and represents a simple subset of the previous
- arithmetic language [which was derived from the KornShell's (( ))
- construct]. The syntax was changed from that of a command denoted by
- ((_e_x_p_r_e_s_s_i_o_n)), to an expansion denoted by $((_e_x_p_r_e_s_s_i_o_n)). The new form
- is a dollar expansion ($), which evaluates the expression and substitutes
- the resulting value. Objections to the previous style of arithmetic
- included that it was too complicated, did not fit in well with the
- shell's use of variables, and the syntax conflicted with subshells. The
- justification for the new syntax is that the shell is traditionally a
- macro language, and if a new feature is to be added, it should be done by
- extending the capabilities presented by the current model of the shell,
- rather than by inventing a new one outside the model: adding a new
- dollar expansion was perceived to be the most intuitive and least
- destructive way to add such a new capability.
-
- In Drafts 9 and 10, a form $[_e_x_p_r_e_s_s_i_o_n] was used. It was functionally
- equivalent to the $(( )) of the current text, but objections were lodged
- that the 1988 KornShell had already implemented $(( )) and there was no
- compelling reason to invent yet another syntax. Furthermore, the $[]
- syntax had a minor incompatibility involving the patterns in case
- statements.
-
- The portion of the C Standard {7} arithmetic operations selected
- corresponds to the operations historically supported in the KornShell.
-
- A simple example using arithmetic expansion:
-
- # repeat a command 100 times
- x=100
- while [ $x -gt 0 ]
- do
- _c_o_m_m_a_n_d
- x=$(($x-1))
- done
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 246 3 Shell Command Language
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- It was concluded that the test command ([) was sufficient for the
- majority of relational arithmetic tests, and that tests involving
- complicated relational expressions within the shell are rare, yet could
- still be accommodated by testing the value of $(()) itself. For example:
-
- # a complicated relational expression
- while [ $(( (($x + $y)/($a * $b)) < ($foo*$bar) )) -ne 0 ]
-
- or better yet, the rare script that has many complex relational
- expressions could define a function like this:
-
- val() {
- return $((!$1))
- }
-
- and complicated tests would be less intimidating:
-
- while val $(( (($x + $y)/($a * $b)) < ($foo*$bar) ))
- do
- # some calculations
- done
-
- Another suggestion was to modify true and false to take an optional
- argument, and true would exit true only if the argument is nonzero, and
- false would exit false only if the argument is nonzero. The suggestion
- was not favorably received by the balloting group (those contacted were
- negative about it, all others were silent in their latest ballots).
-
- while true $(($x > 5 && $y <= 25))
-
- There is a minor portability concern with the new syntax. The example
- $((2+2)) could have been intended to mean a command substitution of a
- utility named 2+2 in a subshell. The developers of POSIX.2 consider this
- to be obscure and isolated to some KornShell scripts [because $( )
- command substitution existed previously only in the KornShell]. The text
- on Command Substitution has been changed to require that the $( and ( be
- separate tokens if this usage is needed.
-
- An example such as
-
- echo $((echo hi);(echo there))
-
- should not be misinterpreted by the shell as arithmetic because attempts
- to balance the parentheses pairs would indicate that they are subshells. 1
- However, as indicated by 3.1.1, a conforming application must separate 1
- two adjacent parentheses with white space to indicate nested subshells. 1
-
- END_RATIONALE 1
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 3.6 Word Expansions 247
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- 3.6.5 Field Splitting
-
- After parameter expansion (3.6.2), command substitution (3.6.3), and
- arithmetic expansion (3.6.4) the shell shall scan the results of
- expansions and substitutions that did not occur in double-quotes for
- field splitting and multiple fields can result.
-
- The shell shall treat each character of the IFS as a delimiter and use
- the delimiters to split the results of parameter expansion and command
- substitution into fields.
-
- (1) If the value of IFS is <space>, <tab>, and <newline>, or if it
- is unset, any sequence of <space>, <tab>, or <newline>
- characters at the beginning or end of the input shall be ignored
- and any sequence of those characters within the input shall
- delimit a field. (For example, the input
-
- <newline><space><tab>foo<tab><tab>bar<space>
-
- yields two fields, foo and bar).
-
- (2) If the value of IFS is null, no field splitting shall be
- performed.
-
- (3) Otherwise, the following rules shall be applied in sequence. 1
- The term ``IFS white space'' is used to mean any sequence (zero 1
- or more instances) of white-space characters that are in the IFS 1
- value (e.g., if IFS contains <space><comma><tab>, any sequence 1
- of <space> and <tab> characters is considered IFS white space). 1
-
- (a) IFS white space shall be ignored at the beginning and end 1
- of the input. 1
-
- (b) Each occurrence in the input of an IFS character that is 1
- not IFS white space, along with any adjacent IFS white 1
- space, shall delimit a field, as described previously. 1
-
- (c) Nonzero-length IFS white space shall delimit a field. 1
-
- BEGIN_RATIONALE
-
-
- 3.6.5.1 Field Splitting Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f
- _P_1_0_0_3._2)
-
- The operation of field splitting using IFS as described in earlier drafts
- was based on the way the KornShell splits words, but is incompatible with
- other common versions of the shell. However, each has merit, and so a
- decision was made to allow both. If the IFS variable is unset, or is
- <space><tab><newline>, the operation is equivalent to the way the
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 248 3 Shell Command Language
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- System V shell splits words. Using characters outside the
- <space><tab><newline> set yields the KornShell behavior, where each of
- the non-<space><tab><newline> characters is significant. This behavior,
- which affords the most flexibility, was taken from the way the original
- awk handled field splitting.
-
- The (3) rule can be summarized as a pseudo ERE: 1
-
- (s*ns*|s+) 1
-
- where s is an IFS white-space character and n is a character in the IFS 1
- that is not white space. Any string matching that ERE delimits a field, 1
- except that the s+ form does not delimit fields at the beginning or the 1
- end of a line. For example, if IFS is <space><comma>, the string 1
-
- <space><space>red<space><space>,<space>white<space>blue 1
-
- yields the three colors as the delimited fields. 1
-
- END_RATIONALE 1
-
-
- 3.6.6 Pathname Expansion
-
- After field splitting, if set -f is not in effect, each field in the
- resulting command line shall be expanded using the algorithm described in
- 3.13, qualified by the rules in 3.13.3.
-
-
- 3.6.7 Quote Removal
-
- The quote characters
-
- \ ' "
-
- (backslash, single-quote, double-quote) that were present in the original
- word shall be removed unless they have themselves been quoted.
-
-
-
- 3.7 Redirection
-
- Redirection is used to open and close files for the current shell
- execution environment (see 3.12) or for any command. _R_e_d_i_r_e_c_t_i_o_n
- _o_p_e_r_a_t_o_r_s can be used with numbers representing file descriptors (see the
- definition in POSIX.1 {8}) as described below. See also 2.9.1. The
- relationship between these file descriptors and access to them in a
- programming language is specified in the language binding for that
- language to this standard.
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 3.7 Redirection 249
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- The overall format used for redirection is:
-
- [_n]_r_e_d_i_r-_o_p _w_o_r_d
-
- The number _n is an optional decimal number designating the file
- descriptor number; it shall be delimited from any preceding text and
- immediately precede the redirection operator _r_e_d_i_r-_o_p. If _n is quoted,
- the number shall not be recognized as part of the redirection expression.
- (For example, echo \2>a writes the character 2 into file a). If any part
- of _r_e_d_i_r-_o_p is quoted, no redirection expression shall be recognized.
- (For example, echo 2\>a writes the characters 2>a to standard output.)
- The optional number, redirection operator, and _w_o_r_d shall not appear in
- the arguments provided to the command to be executed (if any).
-
- In this standard, open files are represented by decimal numbers starting
- with zero. It is implementation defined what the largest value can be;
- however, all implementations shall support at least 0 through 9 for use
- by the application. These numbers are called _f_i_l_e _d_e_s_c_r_i_p_t_o_r_s. The
- values 0, 1, and 2 have special meaning and conventional uses and are
- implied by certain redirection operations; they are referred to as
- _s_t_a_n_d_a_r_d _i_n_p_u_t, _s_t_a_n_d_a_r_d _o_u_t_p_u_t, and _s_t_a_n_d_a_r_d _e_r_r_o_r, respectively.
- Programs usually take their input from standard input, and write output
- on standard output. Error messages are usually written to standard
- error. The redirection operators can be preceded by one or more digits
- (with no intervening <blank>s allowed) to designate the file descriptor
- number.
-
- If the redirection operator is << or <<-, the word that follows the
- redirection operator shall be subjected to quote removal; it is
- unspecified whether any of the other expansions occur. For the other
- redirection operators, the word that follows the redirection operator
- shall be subjected to tilde expansion, parameter expansion, command
- substitution, arithmetic expansion, and quote removal. Pathname
- expansion shall not be performed on the word by a noninteractive shell;
- an interactive shell may perform it, but shall do so only when the
- expansion would result in one word.
-
- If more than one redirection operator is specified with a command, the
- order of evaluation is from beginning to end.
-
- In the following description of redirections, references are made to
- opening and creating files. These references shall conform to the
- requirements in 2.9.1.4. A failure to open or create a file shall cause
- the redirection to fail.
-
-
-
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 250 3 Shell Command Language
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- 3.7.1 Redirecting Input
-
- Input redirection shall cause the file whose name results from the
- expansion of _w_o_r_d to be opened for reading on the designated file
- descriptor, or standard input if the file descriptor is not specified.
-
- The general format for redirecting input is:
-
- [_n]<_w_o_r_d
-
- where the optional _n represents the file descriptor number. If the
- number is omitted, the redirection shall refer to standard input (file
- descriptor 0).
-
-
- 3.7.2 Redirecting Output
-
- The two general formats for redirecting output are:
-
- [_n]>_w_o_r_d
- [_n]>|_w_o_r_d
-
- where the optional _n represents the file descriptor number. If the
- number is omitted, the redirection shall refer to standard output (file
- descriptor 1).
-
- Output redirection using the > format shall fail if the _n_o_c_l_o_b_b_e_r option 1
- is set (see the description of set -C in 3.14.11) and the file named by 1
- the expansion of _w_o_r_d exists and is a regular file. Otherwise, 1
- redirection using the > or >| formats shall cause the file whose name 1
- results from the expansion of _w_o_r_d to be created and opened for ouput on
- the designated file descriptor, or standard output if none is specified.
- If the file does not exist, it shall be created; otherwise, it shall be
- truncated to be an empty file after being opened.
-
-
- 3.7.3 Appending Redirected Output
-
- Appended output redirection shall cause the file whose name results from
- the expansion of word to be opened for output on the designated file
- descriptor. The file is opened as if the POSIX.1 {8} _o_p_e_n() function was
- called with the O_APPEND flag. If the file does not exist, it shall be
- created.
-
- The general format for appending redirected output is as follows:
-
- [_n]>>_w_o_r_d
-
- where the optional _n represents the file descriptor number.
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 3.7 Redirection 251
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- 3.7.4 Here-Document
-
- The redirection operators << and <<- both allow redirection of lines
- contained in a shell input file, known as a _h_e_r_e-_d_o_c_u_m_e_n_t, to the
- standard input of a command.
-
- The here-document shall be treated as a single word that begins after the
- next <newline> and continues until there is a line containing only the
- delimiter, with no trailing <blank>_s. Then the next here-document
- starts, if there is one. The format is as follows:
-
- [_n]<<_w_o_r_d
- _h_e_r_e-_d_o_c_u_m_e_n_t
- _d_e_l_i_m_i_t_e_r
-
- If any character in _w_o_r_d is quoted, the delimiter shall be formed by
- performing quote removal on _w_o_r_d, and the here-document lines shall not
- be expanded. Otherwise, the delimiter shall be the _w_o_r_d itself.
-
- If no characters in _w_o_r_d are quoted, all lines of the here-document shall
- be expanded for parameter expansion, command substitution, and arithmetic
- expansion. In this case, the backslash in the input shall behave as the
- backslash inside double-quotes (see 3.2.3). However, the double-quote
- character (") shall not be treated specially within a here-document,
- except when the double-quote appears within $( ), ` `, or ${ }. 1
-
- If the redirection symbol is <<-, all leading <tab> characters shall be
- stripped from input lines and the line containing the trailing delimiter.
- If more than one << or <<- operator is specified on a line, the here-
- document associated with the first operator shall be supplied first by
- the application and shall be read first by the shell.
-
-
- 3.7.5 Duplicating an Input File Descriptor
-
- The redirection operator
-
- [_n]<&_w_o_r_d
-
- is used to duplicate one input file descriptor from another, or to close
- one. If _w_o_r_d evaluates to one or more digits, the file descriptor
- denoted by _n, or standard input if _n is not specified, shall be made to
- be a copy of the file descriptor denoted by _w_o_r_d; if the digits in _w_o_r_d
- do not represent a file descriptor already open for input, a redirection 1
- error shall result (see 3.8.1). If _w_o_r_d evaluates to -, file descriptor 1
- _n, or standard input if _n is not specified, shall be closed. If _w_o_r_d
- evaluates to something else, the behavior is unspecified.
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 252 3 Shell Command Language
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- 3.7.6 Duplicating an Output File Descriptor
-
- The redirection operator
-
- [_n]>&_w_o_r_d
-
- is used to duplicate one output file descriptor from another, or to close
- one. If _w_o_r_d evaluates to one or more digits, the file descriptor
- denoted by _n, or standard output if _n is not specified, shall be made to
- be a copy of the file descriptor denoted by _w_o_r_d; if the digits in _w_o_r_d
- do not represent a file descriptor already open for output, a redirection 1
- error shall result (see 3.8.1). If _w_o_r_d evaluates to -, file descriptor 1
- _n, or standard output if _n is not specified, shall be closed. If _w_o_r_d
- evaluates to something else, the behavior is unspecified.
-
-
- 3.7.7 Open File Descriptors for Reading and Writing.
-
- The redirection operator
-
- [_n]<>_w_o_r_d
-
- shall cause the file whose name is the expansion of _w_o_r_d to be opened for
- both reading and writing on the file descriptor denoted by _n, or standard
- input if _n is not specified. If the file does not exist, it shall be
- created.
-
- BEGIN_RATIONALE
-
-
- 3.7.8 Redirection Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2)
-
- In the C binding for POSIX.1 {8}, file descriptors are integers in the
- range 0 - ({OPEN_MAX}-1). The file descriptors discussed in Redirection
- are that same set of small integers.
-
- As POSIX.2 is being finalized, it is not known how file descriptors will
- be represented in the language-independent description of POSIX.1 {8}.
- The current consensus appears to be that they will remain as small
- integers, but it is still possible that they will be defined as an opaque
- type. If they remain as integers, then the current POSIX.2 wording is
- acceptable. If they become an opaque type, then the C binding to
- POSIX.1 {8} will have to define the mapping from the binding's small
- integers to the opaque type, and the Redirection clause in POSIX.2 will
- have to be modified to specify that same mapping.
-
- Having multidigit file descriptor numbers for I/O redirection can cause
- some obscure compatibility problems. Specifically, scripts that depend
- on an example command:
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 3.7 Redirection 253
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- echo 22>/dev/null
-
- echoing "2" are somewhat broken to begin with. However, the file
- descriptor number still must be delimited from the preceding text. For
- example,
-
- cat file2>foo
-
- will write the contents of file2, not the contents of file.
-
- The >| format of output redirection was adopted from the KornShell.
- Along with the _n_o_c_l_o_b_b_e_r option, set -C, it provides a safety feature to
- prevent inadvertent overwriting of existing files. (See the rationale
- with the pathchk utility for why this step was taken.) The restriction
- on regular files is historical practice.
-
- The System V shell and the KornShell have differed historically on
- pathname expansion of _w_o_r_d; the former never performed it, the latter
- only when the result was a single field (file). As a compromise, it was
- decided that the KornShell functionality was useful, but only as a
- shorthand device for interactive users. No reasonable shell script would
- be written with a command such as:
-
- cat foo > a*
-
- Thus, shell scripts are prohibited from doing it, while interactive users
- can select the shell with which they are most comfortable.
-
- The construct 2>&1 is often used to redirect standard error to the same
- file as standard output. Since the redirections take place beginning to
- end, the order of redirections is significant. For example:
-
- ls > foo 2>&1
-
- directs both standard output and standard error to file foo. However
-
- ls 2>&1 > foo
-
- only directs standard output to file foo because standard error was
- duplicated as standard output before standard output was directed to file
- foo.
-
- The <> operator is a feature first documented in the KornShell, but it
- has been silently present in both System V and BSD shells. It could be
- useful in writing an application that worked with several terminals, and
- occasionally wanted to start up a shell. That shell would in turn be
- unable to run applications that run from an ordinary controlling terminal 1
- unless it could make use of <> redirection. The specific example is a 1
- historical version of the pager more, which reads from standard error to
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 254 3 Shell Command Language
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- get its commands, so standard input and standard output are both
- available for their usual usage. There is no way of saying the following
- in the shell without <>:
-
- cat food | more - >/dev/tty03 2<>/dev/tty03
-
- Another example of <> is one that opens /dev/tty on file descriptor 3 for
- reading and writing:
-
- exec 3<> /dev/tty
-
- An example of creating a lock file for a critical code region:
-
- set -C
- until 2> /dev/null > lockfile
- do sleep 30
- done
- set +C
- _p_e_r_f_o_r_m _c_r_i_t_i_c_a_l _f_u_n_c_t_i_o_n
- rm lockfile
-
- Since /dev/null is not a regular file, no error is generated by
- redirecting to it in _n_o_c_l_o_b_b_e_r mode.
-
- The case of a missing delimiter at the end of a here-document is not
- specified. This is considered an error in the script (one that sometimes
- can be difficult to diagnose), although some systems have treated end-
- of-file as an implicit delimiter.
-
- Tilde expansion is not performed on a here-document because the data is 1
- treated as if it were enclosed in double-quotes. 1
-
- END_RATIONALE 1
-
-
-
- 3.8 Exit Status and Errors
-
-
- 3.8.1 Consequences of Shell Errors
-
- For a noninteractive shell, an error condition encountered by a special
- built-in (see 3.14) or other type of utility shall cause the shell to
- write a diagnostic message to standard error and exit as shown in the
- following table:
-
- S_p_e_c_i_a_l__B_u_i_l_t_-_i_n_ O_t_h_e_r__U_t_i_l_i_t_i_e_s_
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 3.8 Exit Status and Errors 255
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- Shell language syntax error shall exit shall exit
- Utility syntax error (option shall exit shall not exit
- or operand error)
- Redirection error shall exit shall not exit
- Variable assignment error shall exit shall not exit
- Expansion error shall exit shall exit
- Command not found n/a may exit
- dot script not found shall exit n/a
-
- An ``expansion error'' is one that occurs when the shell expansions
- defined in 3.6 are carried out (e.g., ${x!y}, because ! is not a valid
- operator); an implementation may treat these as syntax errors if it is
- able to detect them during tokenization, rather than during expansion.
-
- If any of the errors shown as ``shall (may) exit'' occur in a subshell,
- the subshell shall (may) exit with a nonzero status, but the script
- containing the subshell shall not exit because of the error.
-
- In all of the cases shown in the table, an interactive shell shall write
- a diagnostic message to standard error without exiting.
-
-
- 3.8.2 Exit Status for Commands
-
- Each command has an exit status that can influence the behavior of other
- shell commands. The exit status of commands that are not utilities are
- documented in this subclause. The exit status of the standard utilities
- are documented in their respective clauses.
-
- If a command is not found by the shell, the exit status shall be 127. If 1
- the command name is found, but it is not an executable utility, the exit 1
- status shall be 126. See 3.9.1.1. Applications that invoke utilities 1
- without using the shell should use these exit status values to report 1
- similar errors. 1
-
- If a command fails during word expansion or redirection, its exit status
- shall be greater than zero.
-
- Internally, for purposes of deciding if a command exits with a nonzero
- exit status, the shell shall recognize the entire status value retrieved
- for the command by the equivalent of the POSIX.1 {8} _w_a_i_t() function
- WEXITSTATUS macro. When reporting the exit status with the special
- parameter ?, the shell shall report the full eight bits of exit status
- available. The exit status of a command that terminated because it
- received a signal shall be reported as greater than 128.
-
- BEGIN_RATIONALE
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 256 3 Shell Command Language
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- 3.8.3 Exit Status and Errors Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f
- _P_1_0_0_3._2)
-
- There is a historical difference in sh and ksh noninteractive error
- behavior. When a command named in a script is not found, some
- implementations of sh exit immediately, but ksh continues with the next
- command. Thus, POSIX.2 says that the shell ``may'' exit in this case.
- This puts a small burden on the programmer, who will have to test for
- successful completion following a command if it is important that the
- next command not be executed if the previous was not found. If it is
- important for the command to have been found, it was probably also
- important for it to complete successfully. The test for successful
- completion would not need to change.
-
- Historically, shells have returned an exit status of 128+_n, where _n
- represents the signal number. Since signal numbers are not standardized,
- there is no portable way to determine which signal caused the
- termination. Also, it is possible for a command to exit with a status in
- the same range of numbers that the shell would use to report that the
- command was terminated by a signal. Implementations are encouraged to 1
- chose exit values greater than 256 to indicate programs that terminated 1
- by a signal so that the exit status cannot be confused with an exit 1
- status generated by a normal termination. 1
-
- Historical shells make the distinction between ``utility not found'' and 1
- ``utility found but cannot execute'' in their error messages. By 1
- specifying two seldomly used exit status values for these cases, 127 and 1
- 126 respectively, this gives an application the opportunity to make use 1
- of this distinction without having to parse an error message that would 1
- probably change from locale to locale. The POSIX.2 command, env, nohup, 1
- and xargs utilities also have been specified to use this convention. 1
-
- When a command fails during word expansion or redirection, most
- historical implementations exit with a status of 1. However, there was
- some sentiment that this value should probably be much higher, so that an
- application could distinguish this case from the more normal exit status
- values. Thus, the language ``greater than zero'' was selected to allow
- either method to be implemented.
-
- END_RATIONALE
-
-
-
-
-
-
-
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 3.8 Exit Status and Errors 257
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- 3.9 Shell Commands
-
- This clause describes the basic structure of shell commands. The
- following command descriptions each describe a format of the command that
- is only used to aid the reader in recognizing the command type, and does
- not formally represent the syntax. Each description discusses the
- semantics of the command; for a formal description of the command
- language, consult the grammar in 3.10.
-
- A _c_o_m_m_a_n_d is one of the following:
-
- - _s_i_m_p_l_e _c_o_m_m_a_n_d (see 3.9.1)
-
- - _p_i_p_e_l_i_n_e (see 3.9.2)
-
- - _l_i_s_t or _c_o_m_p_o_u_n_d-_l_i_s_t (see 3.9.3)
-
- - _c_o_m_p_o_u_n_d _c_o_m_m_a_n_d (see 3.9.4)
-
- - _f_u_n_c_t_i_o_n _d_e_f_i_n_i_t_i_o_n (see 3.9.5).
-
- Unless otherwise stated, the exit status of a command is that of the last
- simple command executed by the command. There is no limit on the size of
- any shell command other than that imposed by the underlying system
- (memory constraints, {ARG_MAX}, etc.).
-
- BEGIN_RATIONALE
-
-
- 3.9.0.1 Shell Commands Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f
- _P_1_0_0_3._2)
-
- A description of an ``empty command'' was removed from an earlier draft 1
- because it is only relevant in the cases of sh -c "", system(""), or an 1
- empty shell-script file (such as the implementation of true on some 1
- historical systems). Since it is no longer mentioned in POSIX.2, it 1
- falls into the silently unspecified category of behavior where 1
- implementations can continue to operate as they have historically, but 1
- conforming applications will not construct empty commands. (However, 1
- note that sh does explicitly state an exit status for an empty string or 1
- file.) In an interactive session or a script with other commands, extra
- <newline>s or semicolons, such as
-
- $ false
- $
- $ echo $?
- 1
-
- would not qualify as the empty command described here because they would
- be consumed by other parts of the grammar.
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 258 3 Shell Command Language
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- END_RATIONALE
-
-
- 3.9.1 Simple Commands
-
- A _s_i_m_p_l_e _c_o_m_m_a_n_d is a sequence of optional variable assignments and
- redirections, in any sequence, optionally followed by words and
- redirections, terminated by a control operator.
-
- When a given simple command is required to be executed (i.e., when any 1
- conditional construct such as an AND-OR list or a case statement has not 1
- bypassed the simple command), the following expansions, assignments, and 1
- redirections shall all be performed from the beginning of the command
- text to the end.
-
- (1) The words that are recognized as variable assignments or
- redirections according to 3.10.2 are saved for processing in
- steps (3) and (4).
-
- (2) The words that are not variable assignments or redirections
- shall be expanded. If any fields remain following their
- expansion, the first field shall be considered the command name,
- and remaining fields shall be the arguments for the command.
-
- (3) Redirections shall be performed as described in 3.7.
-
- (4) Each variable assignment shall be expanded for tilde expansion,
- parameter expansion, command substitution, arithmetic expansion,
- and quote removal prior to assigning the value.
-
- In the preceding list, the order of steps (3) and (4) may be reversed for
- the processing of special built-in utilities. See 3.14.
-
- If no command name results, variable assignments shall affect the current
- execution environment. Otherwise, the variable assignments shall be
- exported for the execution environment of the command and shall not
- affect the current execution environment (except for special built-ins).
- If any of the variable assignments attempt to assign a value to a read-
- only variable, a variable assignment error shall occur. See 3.8.1 for
- the consequences of these errors.
-
- If there is no command name, any redirections shall be performed in a
- subshell environment; it is unspecified whether this subshell environment
- is the same one as that used for a command substitution within the
- command. [To affect the current execution environment, see exec
- (3.14.6)]. If any of the redirections performed in the current shell
- execution environment fail, the command shall immediately fail with an
- exit status greater than zero, and the shell shall write an error message
- indicating the failure. See 3.8.1 for the consequences of these failures
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 3.9 Shell Commands 259
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- on interactive and noninteractive shells.
-
- If there is a command name, execution shall continue as described in
- 3.9.1.1. If there is no command name, but the command contained a
- command substitution, the command shall complete with the exit status of
- the last command substitution performed. Otherwise, the command shall
- complete with a zero exit status.
-
- BEGIN_RATIONALE
-
- 3.9.1.0.1 Simple Commands Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f
- _P_1_0_0_3._2)
-
- The enumerated list is used only when the command is actually going to be 1
- executed. For example, in: 1
-
- true || $foo * 1
-
- no expansions are performed. 1
-
- The following example illustrates both how a variable assignment without
- a command name affects the current execution environment, and how an
- assignment with a command name only affects the execution environment of
- the command.
-
- $ x=red
- $ echo $x
- red
- $ export x
- $ sh -c 'echo $x'
- red
- $ x=blue sh -c 'echo $x'
- blue
- $ echo $x
- red
-
- This next example illustrates that redirections without a command name
- are still performed.
-
- $ ls foo
- ls: foo: no such file or directory
- $ > foo
- $ ls foo
- foo
-
- Historical practice is for a command without a command name, but that
- includes a command substitution, to have an exit status of the last
- command substitution that the shell performed and some historical scripts
- rely on this. For example:
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 260 3 Shell Command Language
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- if x=$(_c_o_m_m_a_n_d)
- then ...
- fi
-
- An example of redirections without a command name being performed in a
- subshell shows that the here-document does not disrupt the standard input
- of the while loop:
-
- IFS=:
- while read a b
- do echo $a
- <<-eof
- Hello
- eof
- done </etc/passwd
-
- Some examples of commands without command names in AND/OR lists:
-
- > foo || {
- echo "error: foo cannot be created" >&2 1
- exit 1 1
- }
-
- # set saved if /vmunix.save exists
- test -f /vmunix.save && saved=1
-
- Command substitution and redirections without command names both occur in
- subshells, but they are not the same ones. For example, in: 1
-
- exec 3> file
- var=$(echo foo >&3) 3>&1
-
- it is unspecified whether foo will be echoed to the file or to standard
- output.
-
- END_RATIONALE
-
-
- 3.9.1.1 Command Search and Execution
-
- If a simple command results in a command name and an optional list of
- arguments, the following actions shall be performed.
-
- (1) If the command name does not contain any slashes, the first
- successful step in the following sequence shall occur:
-
- (a) If the command name matches the name of a special built-in
- utility, that special built-in utility shall be invoked.
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 3.9 Shell Commands 261
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- (b) If the command name matches the name of a function known
- to this shell, the function shall be invoked as described
- in 3.9.5. [If the implementation has provided a standard
- utility in the form of a function, it shall not be
- recognized at this point. It shall be invoked in
- conjunction with the path search in step (1)(d).]
-
- (c) If the command name matches the name of a utility listed
- in Table 2-2 (see 2.3), that utility shall be invoked.
-
- (d) Otherwise, the command shall be searched for using the
- PATH environment variable as described in 2.6:
-
- [1] If the search is successful:
-
- [a] If the system has implemented the utility as a
- regular built-in or as a shell function, it
- shall be invoked at this point in the path
- search.
-
- [b] Otherwise, the shell shall execute the utility 1
- in a separate utility environment (see 3.12) 1
- with actions equivalent to calling the 1
- POSIX.1 {8} _e_x_e_c_v_e() function with the _p_a_t_h
- argument set to the pathname resulting from
- the search, _a_r_g_0 set to the command name, and
- the remaining arguments set to the operands,
- if any.
-
- If the _e_x_e_c_v_e() function fails due to an error
- equivalent to the POSIX.1 {8} error [ENOEXEC],
- the shell shall execute a command equivalent
- to having a shell invoked with the command
- name as its first operand, along with any
- remaining arguments passed along. If the
- executable file is not a text file, the shell
- may bypass this command execution, write an
- error message, and return an exit status of 1
- 126. 1
-
- Once a utility has been searched for and found
- (either as a result of this specific search or as
- part of an unspecified shell startup activity), an
- implementation may remember its location and need
- not search for the utility again unless the PATH
- variable has been the subject of an assignment. If
- the remembered location fails for a subsequent
- invocation, the shell shall repeat the search to
- find the new location for the utility, if any.
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 262 3 Shell Command Language
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- [2] If the search is unsuccessful, the command shall
- fail with an exit status of 127 and the shell shall
- write an error message.
-
- (2) If the command name does contain slashes, the shell shall
- execute the utility in a separate utility environment with 1
- actions equivalent to calling the POSIX.1 {8} _e_x_e_c_v_e() function 1
- with the _p_a_t_h and _a_r_g_0 arguments set to the command name, and
- the remaining arguments set to the operands, if any.
-
- If the _e_x_e_c_v_e() function fails due to an error equivalent to the
- POSIX.1 {8} error [ENOEXEC], the shell shall execute a command
- equivalent to having a shell invoked with the command name as
- its first operand, along with any remaining arguments passed
- along. If the executable file is not a text file, the shell may
- bypass this command execution, write an error message, and
- return an exit status of 126. 1
-
- BEGIN_RATIONALE
-
- 3.9.1.1.1 Command Search and Execution Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t
- _a _p_a_r_t _o_f _P_1_0_0_3._2)
-
- This description requires that the shell can execute shell scripts
- directly, even if the underlying system does not support the common #!
- interpreter convention. That is, if file foo contains shell commands and
- is executable, the following will execute foo:
-
- ./foo
-
- The command search shown here does not match all historical
- implementations. A more typical sequence has been:
-
- - Any built-in, special or regular.
-
- - Functions.
-
- - Path search for executable files.
-
- But there are problems with this sequence. Since the programmer has no
- idea in advance which utilities might have been built into the shell, a
- function cannot be used to portably override a utility of the same name.
- (For example, a function named cd cannot be written for many historical
- systems.) Furthermore, the PATH variable is partially ineffective in
- this case and only a pathname with a slash can be used to ensure a
- specific executable file is invoked.
-
- The sequence selected for POSIX.2 acknowledges that special built-ins
- cannot be overridden, but gives the programmer full control over which
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 3.9 Shell Commands 263
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- versions of other utilities are executed. It provides a means of
- suppressing function lookup (via the command utility; see 4.12) for the
- user's own functions and ensures that any regular built-ins or functions
- provided by the implementation are under the control of the path search.
- The mechanisms for associating built-ins or functions with executable
- files in the path are not specified by POSIX.2, but the wording requires
- that if either is implemented, the application will not be able to
- distinguish a function or built-in from an executable (other than in
- terms of performance, presumably). The implementation must ensure that
- all effects specified by POSIX.2 resulting from the invocation of the
- regular built-in or function (interaction with the environment,
- variables, traps, etc.) are identical to those resulting from the
- invocation of an executable file.
-
- Example: Consider three versions of the ls utility:
-
- - The application includes a shell function named ls.
-
- - The user writes her own utility named ls and puts it in /hsa/bin.
-
- - The example implementation provides ls as a regular shell built-in
- that will be invoked (either by the shell or directly by _e_x_e_c) when
- the path search reaches the directory /posix/bin.
-
- If PATH=/posix/bin, various invocations yield different versions of ls:
-
- Invocation Version of ls
- _______________________________________________ __________________
- ls (from within application script) (1) function
- command ls (from within application script) (3) built-in
- ls (from within makefile called by application) (3) built-in
- system("ls") (3) built-in
- PATH="/hsa/bin:$PATH" ls (2) user's version
-
- After the _e_x_e_c_v_e() failure described, the shell normally executes the
- file as a shell script. Some implementations, however, attempt to detect
- whether the file is actually a script and not an executable from some
- other architecture. The method used by the KornShell is allowed by the
- text that indicates nontext files may be bypassed.
-
- END_RATIONALE
-
-
- 3.9.2 Pipelines
-
- A _p_i_p_e_l_i_n_e is a sequence of one or more commands separated by the control
- operator |. The standard output of all but the last command shall be
- connected to the standard input of the next command.
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 264 3 Shell Command Language
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- The format for a pipeline is:
-
- [!] _c_o_m_m_a_n_d_1 [ | _c_o_m_m_a_n_d_2 ...]
-
- The standard output of _c_o_m_m_a_n_d_1 shall be connected to the standard input
- of _c_o_m_m_a_n_d_2. The standard input, standard output, or both of a command
- shall be considered to be assigned by the pipeline before any redirection
- specified by redirection operators that are part of the command (see
- 3.7).
-
- If the pipeline is not in the background (see 3.9.3.1), the shell shall
- wait for the last command specified in the pipeline to complete, and may
- also wait for all commands to complete.
-
- _E_x_i_t__S_t_a_t_u_s
-
- If the reserved word ! does not precede the pipeline, the exit status
- shall be the exit status of the last command specified in the pipeline.
- Otherwise, the exit status is the logical NOT of the exit status of the
- last command. That is, if the last command returns zero, the exit status
- shall be 1; if the last command returns greater than zero, the exit
- status is zero.
-
- BEGIN_RATIONALE
-
-
- 3.9.2.1 Pipelines Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2)
-
- Because pipeline assignment of standard input or standard output or both
- takes place before redirection, it can be modified by redirection. For
- example:
-
- $ command1 2>&1 | command2
-
- sends both the standard output and standard error of command1 to the
- standard input of command2.
-
- The reserved word ! was added to allow more flexible testing using AND
- and OR lists.
-
- It was suggested that it would be better to return a nonzero value if any
- command in the pipeline terminates with nonzero status (perhaps the
- bitwise OR of all return values). However, the choice of the last-
- specified command semantics are historical practice and would cause
- application breakage if changed. An example of historical (and POSIX.2)
- behavior:
-
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 3.9 Shell Commands 265
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- $ sleep 5 | (exit 4)
- $ echo $?
- 4
- $ (exit 4) | sleep 5 1
- $ echo $? 1
- 0 1
-
- END_RATIONALE
-
-
- 3.9.3 Lists
-
- An _A_N_D-_O_R-_l_i_s_t is a sequence of one or more pipelines separated by the
- operators
-
- && ||
-
- A _l_i_s_t is a sequence of one or more AND-OR-lists separated by the
- operators
-
- ; &
-
- and optionally terminated by
-
- ; & <newline>
-
- The operators && and || shall have equal precedence and shall be
- evaluated from beginning to end.
-
- A ; or <newline> terminator shall cause the preceding AND-OR-list to be
- executed sequentially; an & shall cause asynchronous execution of the
- preceding AND-OR-list.
-
- The term _c_o_m_p_o_u_n_d-_l_i_s_t is derived from the grammar in 3.10; it is
- equivalent to a sequence of _l_i_s_t_s, separated by <newline>s, that can be
- preceded or followed by an arbitrary number of <newline>s.
-
- BEGIN_RATIONALE
-
- 3.9.3.0.1 Lists Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2)
-
- The equal precedence of && and || is historical practice. The developers
- of the standard evaluated the model used more frequently in high level
- programming languages, such as C, to allow the shell logical operators to
- be used for complex expressions in an unambiguous way, but could not in
- the end allow existing scripts to break in the subtle way unequal
- precedence might cause. Some arguments were posed concerning the { } or
- ( ) groupings that are required historically. There are some
- disadvantages to these groupings:
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 266 3 Shell Command Language
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- - The ( ) can be expensive, as they spawn other processes on some
- systems. This performance concern is primarily an implementation
- issue.
-
- - The { } braces are not operators (they are reserved words) and
- require a trailing space after each {, and a semicolon before each
- }. Most programmers (and certainly interactive users) have avoided
- braces as grouping constructs because of the irritating syntax
- required. Braces were not changed to operators because that would
- generate compatibility issues even greater than the precedence
- question; braces appear outside the context of a keyword in many
- shell scripts.
-
- An example reiterates the precedence of the lists as they associate from 1
- beginning to end. Both of the following commands write solely bar to 1
- standard output: 1
-
- false && echo foo || echo bar 1
- true || echo foo && echo bar 1
-
- The following is an example that illustrates <newline>s in compound-
- lists:
-
- while
- # a couple of newlines
-
- # a list
- date && who || ls; cat file
- # a couple of newlines
-
- # another list
- wc file > output & true
-
- do
- # 2 lists
- ls
- cat file
- done
-
- END_RATIONALE
-
-
- 3.9.3.1 Asynchronous Lists
-
- If a command is terminated by the control operator ampersand (&), the
- shell shall execute the command asynchronously in a subshell. This means
- that the shell shall not wait for the command to finish before executing
- the next command.
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 3.9 Shell Commands 267
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- The format for running a command in background is:
-
- _c_o_m_m_a_n_d_1 & [_c_o_m_m_a_n_d_2 & ...]
-
- The standard input for an asynchronous list, before any explicit
- redirections are performed, shall be considered to be assigned to a file
- that has the same properties as /dev/null. If it is an interactive
- shell, this need not happen. In all cases, explicit redirection of
- standard input shall override this activity.
-
- When an element of an asynchronous list (the portion of the list ended by 1
- an ampersand, such as _c_o_m_m_a_n_d_1, above) is started by the shell, the 1
- process ID of the last command in the asynchronous list element shall 1
- become known in the current shell execution environment; see 3.12. This
- process ID shall remain known until:
-
- - The command terminates and the application waits for the process
- ID, or
-
- - Another asynchronous list is invoked before $! (corresponding to 1
- the previous asynchronous list) is expanded in the current 1
- execution environment. 1
-
- The implementation need not retain more than the {CHILD_MAX} most recent 1
- entries in its list of known process IDs in the current shell execution 1
- environment. 1
-
- _E_x_i_t__S_t_a_t_u_s
-
- The exit status of an asynchronous list shall be zero.
-
- BEGIN_RATIONALE
-
- 3.9.3.1.1 Asynchronous Lists Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f
- _P_1_0_0_3._2)
-
- The grammar treats a construct such as 1
-
- foo & bar & bam & 1
-
- as one ``asynchronous list,'' but since the status of each element is 1
- tracked by the shell, the term ``element of an asynchronous list'' was 1
- introduced to identify just one of the foo, bar, bam portions of the 1
- overall list. 1
-
- Unless the implementation has an internal limit, such as {CHILD_MAX}, on 1
- the retained process IDs, it would require unbounded memory for the 1
- following example: 1
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 268 3 Shell Command Language
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- while true 1
- do foo & echo $! 1
- done 1
-
- The treatment of the signals SIGINT and SIGQUIT with asynchronous lists
- is described in 3.11.
-
- Since the connection of the input to the equivalent of /dev/null is
- considered to occur before redirections, the following script would
- produce no output:
-
- exec < /etc/passwd
- cat <&0 &
- wait
-
- END_RATIONALE
-
-
- 3.9.3.2 Sequential Lists
-
- Commands that are separated by a semicolon (;) shall be executed
- sequentially.
-
- The format for executing commands sequentially is:
-
- _c_o_m_m_a_n_d_1 [; _c_o_m_m_a_n_d_2] ...
-
- Each command shall be expanded and executed in the order specified.
-
- _E_x_i_t__S_t_a_t_u_s
-
- The exit status of a sequential list shall be the exit status of the last
- command in the list.
-
- 3.9.3.3 AND Lists
-
- The control operator && shall denote an AND list. The format is:
-
- _c_o_m_m_a_n_d_1 [ && _c_o_m_m_a_n_d_2] ...
-
- First _c_o_m_m_a_n_d_1 is executed. If its exit status is zero, _c_o_m_m_a_n_d_2 is
- executed, and so on until a command has a nonzero exit status or there
- are no more commands left to execute. The commands shall be expanded
- only if they are executed.
-
- _E_x_i_t__S_t_a_t_u_s
-
- The exit status of an AND list shall be the exit status of the last
- command that is executed in the list.
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 3.9 Shell Commands 269
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- 3.9.3.4 OR Lists
-
- The control operator || shall denote an OR List. The format is:
-
- _c_o_m_m_a_n_d_1 [ || _c_o_m_m_a_n_d_2] ...
-
- First, _c_o_m_m_a_n_d_1 is executed. If its exit status is nonzero, _c_o_m_m_a_n_d_2 is
- executed, and so on until a command has a zero exit status or there are
- no more commands left to execute.
-
- _E_x_i_t__S_t_a_t_u_s
-
- The exit status of an OR list shall be the exit status of the last
- command that is executed in the list.
-
-
- 3.9.4 Compound Commands
-
- The shell has several programming constructs that are _c_o_m_p_o_u_n_d _c_o_m_m_a_n_d_s,
- which provide control flow for commands. Each of these compound commands
- has a reserved word or control operator at the beginning, and a
- corresponding terminator reserved word or operator at the end. In
- addition, each can be followed by redirections on the same line as the
- terminator. Each redirection shall apply to all the commands within the
- compound command that do not explicitly override that redirection.
-
-
- 3.9.4.1 Grouping Commands
-
- The format for grouping commands is as follows:
-
- (_c_o_m_p_o_u_n_d-_l_i_s_t) Execute _c_o_m_p_o_u_n_d-_l_i_s_t in a subshell environment;
- see 3.12. Variable assignments and built-in
- commands that affect the environment shall not
- remain in effect after the list finishes.
-
- { _c_o_m_p_o_u_n_d-_l_i_s_t;} Execute _c_o_m_p_o_u_n_d-_l_i_s_t in the current process
- environment.
-
- _E_x_i_t__S_t_a_t_u_s
-
- The exit status of a grouping command shall be the exit status of _l_i_s_t.
-
- BEGIN_RATIONALE
-
-
-
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 270 3 Shell Command Language
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- 3.9.4.1.1 Grouping Commands Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f
- _P_1_0_0_3._2)
-
- The semicolon shown in { _c_o_m_p_o_u_n_d-_l_i_s_t;} is an example of a control
- operator delimiting the } reserved word. Other delimiters are possible,
- as shown in 3.10; <newline> is frequently used.
-
- A proposal was made to use the <do-done> construct in all cases where
- command grouping performed in the current process environment is
- performed, identifying it as a construct for the grouping commands, as
- well as for shell functions. This was not included because the shell
- already has a grouping construct for this purpose ({ }), and changing it
- would have been counter-productive.
-
- END_RATIONALE
-
-
- 3.9.4.2 for Loop
-
- The for loop shall execute a sequence of commands for each member in a
- list of _i_t_e_m_s. The for loop requires that the _r_e_s_e_r_v_e_d _w_o_r_d_s do and done
- be used to delimit the sequence of commands.
-
- The format for the for loop is as follows.
-
- for _n_a_m_e [ in _w_o_r_d ... ]
- do
- _c_o_m_p_o_u_n_d-_l_i_s_t
- done
-
- First, the list of words following in shall be expanded to generate a
- list of items. Then, the variable _n_a_m_e shall be set to each item, in
- turn, and the _c_o_m_p_o_u_n_d-_l_i_s_t executed each time. If no items result from
- the expansion, the _c_o_m_p_o_u_n_d-_l_i_s_t shall not be executed. Omitting
-
- in _w_o_r_d ...
-
- is equivalent to
-
- in "$@"
-
- _E_x_i_t__S_t_a_t_u_s
-
- The exit status of a for command shall be the exit status of the last
- command that executes. If there are no items, the exit status shall be
- zero.
-
- BEGIN_RATIONALE
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 3.9 Shell Commands 271
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- 3.9.4.2.1 for Loop Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2)
-
- The format is shown with generous usage of <newline>s. See the grammar
- in 3.10 for a precise description of where <newline>s and semicolons can
- be interchanged.
-
- Some historical implementations support { and } as substitutes for do and
- done. The working group chose to omit them, even as an obsolescent
- feature. (Note that these substitutes were only for the for command; the
- while and until commands could not use them historically, because they 1
- are followed by compound-lists that may contain {...} grouping commands 1
- themselves 1
-
- The reserved word pair do ... done was selected rather than do ... od
- (which would have matched the spirit of if ... fi and case ... esac)
- because od is a commonly-used utility name and this would have been an
- unacceptable choice.
-
- END_RATIONALE
-
-
- 3.9.4.3 case Conditional Construct
-
- The conditional construct case shall execute the _c_o_m_p_o_u_n_d-_l_i_s_t
- corresponding to the first one of several _p_a_t_t_e_r_n_s (see 3.13) that is
- matched by the string resulting from the tilde expansion, parameter
- expansion, command substitution, and arithmetic expansion and quote
- removal of the given word. The reserved word in shall denote the
- beginning of the patterns to be matched. Multiple patterns with the same
- _c_o_m_p_o_u_n_d-_l_i_s_t are delimited by the | symbol. The control operator )
- terminates a list of patterns corresponding to a given action. The
- _c_o_m_p_o_u_n_d-_l_i_s_t for each list of patterns is terminated with ;;. The case
- construct terminates with the reserved word esac (case reversed).
-
- The format for the case construct is as follows.
-
- case _w_o_r_d in
- [(]_p_a_t_t_e_r_n_1) _c_o_m_p_o_u_n_d-_l_i_s_t;; 2
- [(]_p_a_t_t_e_r_n_2|_p_a_t_t_e_r_n_3)_c_o_m_p_o_u_n_d-_l_i_s_t;; 2
- ...
- esac
-
- The ;; is optional for the last _c_o_m_p_o_u_n_d-_l_i_s_t.
-
- Each pattern in a pattern list shall be expanded and compared against the
- expansion of _w_o_r_d. After the first match, no more patterns shall be
- expanded, and the _c_o_m_p_o_u_n_d-_l_i_s_t shall be executed. The order of
- expansion and comparing of patterns in a multiple pattern list is
- unspecified.
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 272 3 Shell Command Language
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- _E_x_i_t__S_t_a_t_u_s
-
- The exit status of case is zero if no patterns are matched. Otherwise,
- the exit status shall be the exit status of the last command executed in
- the _c_o_m_p_o_u_n_d-_l_i_s_t.
-
- BEGIN_RATIONALE
-
- 3.9.4.3.1 case Conditional Construct Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a
- _p_a_r_t _o_f _P_1_0_0_3._2)
-
- An optional open-parenthesis before _p_a_t_t_e_r_n was added to allow numerous 2
- historical KornShell scripts to conform. At one time, using the leading 2
- parenthesis was required if the case statement were to be embedded within 2
- a $( ) command substitution; this is no longer the case with the POSIX 2
- shell. Nevertheless, many existing scripts use the open-parenthesis, if 2
- only because it makes matching-parenthesis searching easier in vi and 2
- other editors. This is a relatively simple implementation change that is 2
- fully upward compatible for all scripts. 2
-
- Consideration was given to requiring break inside the _c_o_m_p_o_u_n_d-_l_i_s_t to
- prevent falling through to the next pattern action list. This was
- rejected as being nonexisting practice. An interesting undocumented
- feature of the KornShell is that using ;& instead of ;; as a terminator
- causes the exact opposite behavior--the flow of control continues with
- the next _c_o_m_p_o_u_n_d-_l_i_s_t.
-
- The pattern "*", given as the last pattern in a case construct, is
- equivalent to the default case in a C-language switch statement
-
- The grammar shows that reserved words can be used as patterns, even if
- one is the first word on a line. Obviously, the reserved word esac
- cannot be used in this manner.
-
- END_RATIONALE
-
-
- 3.9.4.4 if Conditional Construct
-
- The if command shall execute a _c_o_m_p_o_u_n_d-_l_i_s_t and use its exit status to
- determine whether to execute another _c_o_m_p_o_u_n_d-_l_i_s_t.
-
- The format for the if construct is as follows.
-
-
-
-
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 3.9 Shell Commands 273
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- if _c_o_m_p_o_u_n_d-_l_i_s_t
- _t_h_e_n
- _c_o_m_p_o_u_n_d-_l_i_s_t
- [elif _c_o_m_p_o_u_n_d-_l_i_s_t
- _t_h_e_n
- _c_o_m_p_o_u_n_d-_l_i_s_t] ...
- [else
- _c_o_m_p_o_u_n_d-_l_i_s_t]
- fi
-
- The if _c_o_m_p_o_u_n_d-_l_i_s_t is executed; if its exit status is zero, the then
- _c_o_m_p_o_u_n_d-_l_i_s_t is executed and the command shall complete. Otherwise,
- each elif _c_o_m_p_o_u_n_d-_l_i_s_t is executed, in turn, and if its exit status is
- zero, the then _c_o_m_p_o_u_n_d-_l_i_s_t is executed and the command shall complete.
- Otherwise, the else _c_o_m_p_o_u_n_d-_l_i_s_t is executed.
-
- _E_x_i_t__S_t_a_t_u_s
-
- The exit status of the if command shall be the exit status of the then or
- else _c_o_m_p_o_u_n_d-_l_i_s_t that was executed, or zero, if none was executed.
-
- BEGIN_RATIONALE
-
- 3.9.4.4.1 if Conditional Construct Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a
- _p_a_r_t _o_f _P_1_0_0_3._2)
-
- The precise format for the command syntax is described in 3.10.
-
- END_RATIONALE
-
-
- 3.9.4.5 while Loop
-
- The while loop continuously shall execute one _c_o_m_p_o_u_n_d-_l_i_s_t as long as
- another _c_o_m_p_o_u_n_d-_l_i_s_t has a zero exit status.
-
- The format of the while loop is as follows
-
- while _c_o_m_p_o_u_n_d-_l_i_s_t-_1
- _d_o
- _c_o_m_p_o_u_n_d-_l_i_s_t-_2
- _d_o_n_e
-
- The _c_o_m_p_o_u_n_d-_l_i_s_t-_1 shall be executed, and if it has a nonzero exit
- status, the while command shall complete. Otherwise, the _c_o_m_p_o_u_n_d-_l_i_s_t-_2
- shall be executed, and the process shall repeat.
-
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 274 3 Shell Command Language
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- _E_x_i_t__S_t_a_t_u_s
-
- The exit status of the while loop shall be the exit status of the last
- _c_o_m_p_o_u_n_d-_l_i_s_t-_2 executed, or zero if none was executed.
-
- BEGIN_RATIONALE
-
- 3.9.4.5.1 while Loop Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f
- _P_1_0_0_3._2)
-
- The precise format for the command syntax is described in 3.10.
-
- END_RATIONALE
-
-
- 3.9.4.6 until Loop
-
- The until loop continuously shall execute one _c_o_m_p_o_u_n_d-_l_i_s_t as long as
- another _c_o_m_p_o_u_n_d-_l_i_s_t has a nonzero exit status.
-
- The format of the until loop is as follows
-
- until _c_o_m_p_o_u_n_d-_l_i_s_t-_1
- _d_o
- _c_o_m_p_o_u_n_d-_l_i_s_t-_2
- _d_o_n_e
-
- The _c_o_m_p_o_u_n_d-_l_i_s_t-_1 shall be executed, and if it has a zero exit status,
- the until command shall complete. Otherwise, the _c_o_m_p_o_u_n_d-_l_i_s_t-_2 shall
- be executed, and the process shall repeat.
-
- _E_x_i_t__S_t_a_t_u_s
-
- The exit status of the until loop shall be the exit status of the last
- _c_o_m_p_o_u_n_d-_l_i_s_t-_2 executed, or zero if none was executed.
-
- BEGIN_RATIONALE
-
- 3.9.4.6.1 until Loop Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f
- _P_1_0_0_3._2)
-
- The precise format for the command syntax is described in 3.10.
-
- END_RATIONALE
-
-
-
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 3.9 Shell Commands 275
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- 3.9.5 Function Definition Command
-
- A function is a user-defined name that is used as a simple command to
- call a compound command with new positional parameters. A function is
- defined with a _f_u_n_c_t_i_o_n _d_e_f_i_n_i_t_i_o_n _c_o_m_m_a_n_d.
-
- The format of a function definition command is as follows:
-
- _f_n_a_m_e() _c_o_m_p_o_u_n_d-_c_o_m_m_a_n_d [_i_o-_r_e_d_i_r_e_c_t ...]
-
- The function is named _f_n_a_m_e; it shall be a name (see 3.1.5). An 1
- implementation may allow other characters in a function name as an 1
- extension. The implementation shall maintain separate namespaces for 1
- functions and variables.
-
- The argument _c_o_m_p_o_u_n_d-_c_o_m_m_a_n_d represents a compound command, as described
- in 3.9.4.
-
- When the function is declared, none of the expansions in 3.6 shall be
- performed on the text in _c_o_m_p_o_u_n_d-_c_o_m_m_a_n_d or _i_o-_r_e_d_i_r_e_c_t; all expansions
- shall be performed as normal each time the function is called.
- Similarly, the optional _i_o-_r_e_d_i_r_e_c_t redirections and any variable
- assignments within _c_o_m_p_o_u_n_d-_c_o_m_m_a_n_d shall be performed during the
- execution of the function itself, not the function definition. See 3.8.1
- for the consequences of failures of these operations on interactive and
- noninteractive shells.
-
- When a function is executed, it shall have the syntax-error and
- variable-assignment properties described for special built-in utilities,
- in the enumerated list at the beginning of 3.14.
-
- The _c_o_m_p_o_u_n_d-_c_o_m_m_a_n_d shall be executed whenever the function name is
- specified as the name of a simple command (see 3.9.1.1). The operands to
- the command temporarily shall become the positional parameters during the
- execution of the _c_o_m_p_o_u_n_d-_c_o_m_m_a_n_d; the special parameter # shall also be
- changed to reflect the number of operands. The special parameter 0 shall
- be unchanged. When the function completes, the values of the positional
- parameters and the special parameter # shall be restored to the values
- they had before the function was executed. If the special built-in
- return is executed in the _c_o_m_p_o_u_n_d-_c_o_m_m_a_n_d, the function shall complete
- and execution shall resume with the next command after the function call.
-
- _E_x_i_t__S_t_a_t_u_s
-
- The exit status of a function definition shall be zero if the function
- was declared successfully; otherwise, it shall be greater than zero. The
- exit status of a function invocation shall be the exit status of the last
- command executed by the function.
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 276 3 Shell Command Language
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- BEGIN_RATIONALE
-
-
- 3.9.5.1 Function Definition Command Rationale (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a
- _p_a_r_t _o_f _P_1_0_0_3._2)
-
- The description of functions in Draft 8 was based on the notion that
- functions should behave like miniature shell scripts; that is, except for
- sharing variables, most elements of an execution environment should
- behave as if it were a new execution environment, and changes to these
- should be local to the function. For example, traps and options should
- be reset on entry to the function, and any changes to them don't affect
- the traps or options of the caller. There were numerous objections to
- this basic idea, and the opponents asserted that functions were intended
- to be a convenient mechanism for grouping commonly executed commands that
- were to be executed in the current execution environment, similar to the
- execution of the dot special built-in.
-
- Opponents also pointed out that the functions described in Draft 8 did
- not scope everything a new shell script would anyway, such as the current
- working directory, or umask, but instead picked a few select properties.
- The basic argument was that if one wanted scoping of the execution
- environment, the mechanism already exists: put the commands in a new
- shell script and call it. All traditional shells that implemented
- functions, other than the KornShell, have implemented functions that
- operate in the current execution environment. Because of this, Draft 9
- removed any local scoping of traps or options. Local variables within a
- function were considered and included in Draft 9 (controlled by the
- special built-in local), but were removed because they do not fit the
- simple model developed for the scoping of functions and there was some
- opposition to adding yet another new special built-in from outside
- existing practice. Implementations should reserve the identifier local
- (as well as typeset, as used in the KornShell) in case this local
- variable mechanism is adopted in a future version of POSIX.2.
-
- A separate issue from the execution environment of a function is the
- availability of that function to child shells. A few objectors,
- including the author of the original Version 7 UNIX system shell,
- maintained that just as a variable can be shared with child shells by
- exporting it, so should a function--and so this capability has been added
- to the standard. In previous drafts, the export command therefore had a
- -f flag for exporting functions. Functions that were exported were to be
- put into the environment as _n_a_m_e()=_v_a_l_u_e pairs, and upon invocation, the
- shell would scan the environment for these, and automatically define
- these functions. This facility received a lot of balloting opposition
- and was removed from Draft 11. Some of the arguments against exportable
- functions were:
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 3.9 Shell Commands 277
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- - There was little existing practice. The Ninth Edition shell
- provided them, but there was controversy over how well it worked.
-
- - There are numerous security problems associated with functions
- appearing in a script's environment and overriding standard
- utilities or the application's own utilities.
-
- - There was controversy over requiring make to import functions,
- where it has historically used an _e_x_e_c function for many of its
- command line executions.
-
- - Functions can be big and the environment is of a limited size.
- (The counter-argument was that functions are no different than
- variables in terms of size: there can be big ones, and there can be
- small ones--and just as one does not export huge variables, one
- does not export huge functions. However, this insight might be
- lost on the average shell-function writer, who typically writes
- much larger functions than variables.)
-
- As far as can be determined, the functions in POSIX.2 match those in
- System V. The KornShell has two methods of defining functions:
-
- function _f_n_a_m_e { _c_o_m_p_o_u_n_d-_l_i_s_t }
-
- and
-
- _f_n_a_m_e() { _c_o_m_p_o_u_n_d-_l_i_s_t }
-
- The latter uses the same definition as POSIX.2, but differs in semantics,
- as described previously. A future edition of the KornShell is planned to
- align the latter syntax with POSIX and keep the former as-is.
-
- The name space for functions is limited to that of a _n_a_m_e because of 1
- historical practice. Complications in defining the syntactic rules for 1
- the function definition command and in dealing with known extensions such 1
- as the KornShell's @() prevented the name space from being widened to a 1
- _w_o_r_d, as requested by some balloters. Using functions to support 1
- synonyms such as the C-shell's !! and % is thus disallowed to portable 1
- applications, but acceptable as an extension. For interactive users, the 1
- aliasing facilities in the UPE should be adequate for this purpose. It 1
- is recognized that the name space for utilities in the file system is 1
- wider than that currently supported for functions, if the portable 1
- filename character set guidelines are ignored, but it did not seem useful 1
- to mandate extensions in systems for so little benefit to portable 1
- applications. 1
-
- The () in the function definition command consists of two operators.
- Therefore, intermixing <blank>_s with the _f_n_a_m_e, (, and ) is allowed, but
- unnecessary.
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 278 3 Shell Command Language
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- An example of how a function definition can be used wherever a simple
- command is allowed:
-
- # If variable i is equal to "yes",
- # define function foo to be ls -l
- #
- [ X$i = Xyes ] && foo() {
- ls -l
- }
-
- END_RATIONALE
-
-
-
- 3.10 Shell Grammar
-
- The following grammar describes the Shell Command Language. Any
- discrepancies found between this grammar and the preceding description
- shall be resolved in favor of this clause.
-
-
- 3.10.1 Shell Grammar Lexical Conventions
-
- The input language to the shell must be first recognized at the character
- level. The resulting tokens shall be classified by their immediate
- context according to the following rules (applied in order). These rules
- are used to determine what a ``token'' that is subject to parsing at the
- token level is. The rules for token recognition in 3.3 shall apply.
-
- (1) A <newline> shall be returned as the token identifier NEWLINE.
-
- (2) If the token is an operator, the token identifier for that
- operator shall result.
-
- (3) If the string consists solely of digits and the delimiter
- character is one of < or >, the token identifier IO_NUMBER shall
- be returned.
-
- (4) Otherwise, the token identifier TOKEN shall result.
-
- Further distinction on TOKEN is context-dependent. It may be that the
- same TOKEN yields WORD, a NAME, an ASSIGNMENT, or one of the reserved
- words below, dependent upon the context. Some of the productions in the
- grammar below are annotated with a rule number from the following list.
- When a TOKEN is seen where one of those annotated productions could be
- used to reduce the symbol, the applicable rule shall be applied to
- convert the token identifier type of the TOKEN to a token identifier
- acceptable at that point in the grammar. The reduction shall then
- proceed based upon the token identifier type yielded by the rule applied.
- When more than one rule applies, the highest numbered rule shall apply
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 3.10 Shell Grammar 279
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- (which in turn may refer to another rule). [Note that except in rule
- (7), the presence of an = in the token has no effect.]
-
- The WORD tokens shall have the word expansion rules applied to them
- immediately before the associated command is executed, not at the time
- the command is parsed.
-
-
- 3.10.2 Shell Grammar Rules
-
- (1) [Command Name]
- When the TOKEN is exactly a reserved word, the token identifier
- for that reserved word shall result. Otherwise, the token WORD
- shall be returned. Also, if the parser is in any state where 1
- only a reserved word could be the next correct token, proceed as 1
- above. 1
-
- NOTE: Because at this point quote marks are retained in the
- token, quoted strings cannot be recognized as reserved words.
- This rule also implies that reserved words will not be
- recognized except in certain positions in the input, such as
- after a <newline> or semicolon; the grammar presumes that if the
- reserved word is intended, it will be properly delimited by the
- user, and does not attempt to reflect that requirement directly.
- Also note that line joining is done before tokenization, as
- described in 3.2.1, so escaped newlines are already removed at
- this point.
-
- NOTE: Rule (1) is not directly referenced in the grammar, but 1
- is referred to by other rules, or applies globally. 1
-
- (2) [Redirection to/from filename]
- The expansions specified in 3.7 shall occur. As specified
- there, exactly one field can result (or the result is 1
- unspecified), and there are additional requirements on pathname
- expansion.
-
- (3) [Redirection from here-document]
- Quote removal [3.7.4]. shall be applied to the word to 1
- determine the delimiter that will be used to find the end of the 1
- here-document that begins after the next <newline>. 1
-
- (4) [Case statement termination]
- When the TOKEN is exactly the reserved word Esac, the token
- identifier for Esac shall result. Otherwise, the token WORD
- shall be returned.
-
- (5) [NAME in for]
- When the TOKEN meets the requirements for a name [3.1.5], the
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 280 3 Shell Command Language
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- token identifier NAME shall result. Otherwise, the token WORD
- shall be returned.
-
- (6) [Third word of for and case]
- When the TOKEN is exactly the reserved word In, the token
- identifier for In shall result. Otherwise, the token WORD shall
- be returned.
-
- (7) [Assignment preceding command name] 1
-
- (a) [When the first word]
- If the TOKEN does not contain the character =, rule (1)
- shall be applied. Otherwise, apply (7)(b).
-
- (b) [Not the first word]
- If the TOKEN contains the equals-sign character:
-
- - If it begins with =, the token WORD shall be returned.
-
- - If all the characters preceding = form a valid name
- [3.1.5], the token ASSIGNMENT_WORD shall be returned.
- (Quoted characters cannot participate in forming a
- valid name.)
-
- - Otherwise, it is unspecified whether it is
- ASSIGNMENT_WORD or WORD that is returned.
-
- Assignment to the NAME shall occur as specified in 3.9.1.
-
- (8) [NAME in function]
- When the TOKEN is exactly a reserved word, the token identifier
- for that reserved word shall result. Otherwise, when the TOKEN
- meets the requirements for a name [3.1.5], the token identifier
- NAME shall result. Otherwise, rule (7) shall apply.
-
- (9) [Body of function]
- Word expansion and assignment shall never occur, even when
- required by the rules above, when this rule is being parsed.
- Each TOKEN that might either be expanded or have assignment
- applied to it shall instead be returned as a single WORD
- consisting only of characters that are exactly the token
- described in 3.3.
-
- /* -------------------------------------------------------
- The grammar symbols
- ------------------------------------------------------- */
-
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 3.10 Shell Grammar 281
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- %token WORD
- %token ASSIGNMENT_WORD
- %token NAME
- %token NEWLINE
- %token IO_NUMBER
-
- /* The following are the operators mentioned above. */
-
- %token AND_IF OR_IF DSEMI
- /* '&&' '||' ';;' */
-
- %token DLESS DGREAT LESSAND GREATAND LESSGREAT DLESSDASH
- /* '<<' '>>' '<&' '>&' '<>' '<<-' */
-
- %token CLOBBER
- /* '>|' */
-
- /* The following are the reserved words */
-
- %token If Then Else Elif Fi Do Done
- /* 'if' 'then' 'else' 'elif' 'fi' 'do' 'done' */
-
- %token Case Esac While Until For
- /* 'case' 'esac' 'while' 'until' 'for' */
-
- /* These are reserved words, not operator tokens, and are
- recognized when reserved words are recognized. */
-
- %token Lbrace Rbrace Bang
- /* '{' '}' '!' */
-
- %token In
- /* 'in' */
-
- /* -------------------------------------------------------
- The Grammar
- ------------------------------------------------------- */
-
- %start complete_command
-
- %%
-
- complete_command : list separator
- | list 1
- ;
-
- list : list separator_op and_or
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 282 3 Shell Command Language
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- | and_or
- ;
-
- and_or : pipeline
- | and_or AND_IF linebreak pipeline
- | and_or OR_IF linebreak pipeline
- ;
-
- pipeline : pipe_sequence
- | Bang pipe_sequence
- ;
-
- pipe_sequence : command
- | pipe_sequence '|' linebreak command
- ;
-
- command : simple_command
- | compound_command
- | compound_command redirect_list
- | function_definition
- ;
-
- compound_command : brace_group
- | subshell
- | for_clause
- | case_clause
- | if_clause
- | while_clause
- | until_clause
- ;
-
- subshell : '(' compound_list ')'
- ;
-
- compound_list : term
- | newline_list term
- | term separator
- | newline_list term separator
- ;
-
- term : term separator and_or
- | and_or
- ;
-
- for_clause : For name do_group
- | For name In wordlist sequential_sep do_group
- ;
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 3.10 Shell Grammar 283
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- name : NAME /* Apply rule (5) */ 2
- ;
-
- in : In /* Apply rule (6) */
- ;
-
- wordlist : wordlist WORD
- | WORD
- ;
-
- case_clause : Case WORD In linebreak case_list Esac
- | Case WORD In linebreak Esac
- ;
-
- case_list : case_list case_item
- | case_item
- ;
-
- case_item : pattern ')' linebreak DSEMI linebreak
- | pattern ')' compound_list DSEMI linebreak
- | '(' pattern ')' linebreak DSEMI linebreak 2
- | '(' pattern ')' compound_list DSEMI linebreak 2
- ;
-
- pattern : WORD /* Apply rule (4) */
- | pattern '|' WORD /* Do not apply rule (4) */ 1
- ;
-
- if_clause : If compound_list Then compound_list else_part Fi
- | If compound_list Then compound_list Fi
- ;
-
- else_part : Elif compound_list Then else_part
- | Else compound_list
- ;
-
- while_clause : While compound_list do_group
- ;
-
- until_clause : Until compound_list do_group
- ;
-
- function_definition : fname '(' ')' linebreak function_body
- ;
-
- function_body : compound_command /* Apply rule (9) */
- | compound_command redirect_list /* Apply rule (9) */
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 284 3 Shell Command Language
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- ;
-
- fname : NAME /* Apply rule (8) */ 2
- ;
-
- brace_group : Lbrace compound_list Rbrace
- ;
-
- do_group : Do compound_list Done
- ;
-
- simple_command : cmd_prefix cmd_word cmd_suffix
- | cmd_prefix cmd_word
- | cmd_prefix
- | cmd_name cmd_suffix
- | cmd_name
- ;
-
- cmd_name : WORD /* Apply rule (7)(a) */
- ;
-
- cmd_word : WORD /* Apply rule (7)(b) */
- ;
-
- cmd_prefix : io_redirect
- | cmd_prefix io_redirect
- | ASSIGNMENT_WORD
- | cmd_prefix ASSIGNMENT_WORD
- ;
-
- cmd_suffix : io_redirect
- | cmd_suffix io_redirect
- | WORD
- | cmd_suffix WORD
- ;
-
-
- redirect_list : io_redirect
- | redirect_list io_redirect
- ;
-
- io_redirect : io_file
- | IO_NUMBER io_file
- | io_here
- | IO_NUMBER io_here
- ;
-
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 3.10 Shell Grammar 285
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- io_file : '<' filename
- | LESSAND filename
- | '>' filename
- | GREATAND filename
- | DGREAT filename
- | LESSGREAT filename
- | CLOBBER filename
- ;
-
- filename : WORD /* Apply rule (2) */
- ;
-
- io_here : DLESS here_end
- | DLESSDASH here_end
- ;
-
- here_end : WORD /* Apply rule (3) */
- ;
-
- newline_list : NEWLINE
- | newline_list NEWLINE
- ;
-
- linebreak : newline_list
- | /* empty */
- ;
-
- separator_op : '&'
- | ';'
- ;
-
- separator : separator_op linebreak
- | newline_list
- ;
-
- sequential_sep : ';' linebreak
- | newline_list
- ;
-
- BEGIN_RATIONALE
-
-
- 3.10.3 Shell Grammar Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f
- _P_1_0_0_3._2)
-
- There are several subtle aspects of this grammar where conventional usage
- implies rules about the grammar that in fact are not true.
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 286 3 Shell Command Language
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- For compound_list, only the forms that end in a separator allow a
- reserved word to be recognized, so usually only a separator can be used 1
- where a compound list precedes a reserved word (such as Then, Else, Do,
- and Rbrace. Explicitly requiring a separator would disallow such valid
- (if rare) statements as:
-
- if (false) then (echo x) else (echo y) fi
-
- See the NOTE under special grammar rule (1).
-
- Concerning the third sentence of rule (1) (``Also, if the parser ...''): 1
-
- - This sentence applies rather narrowly: when a compound list is 1
- terminated by some clear delimiter (such as the closing fi of an 1
- inner if_clause) then it would apply; where the compound list might 1
- continue (as in after a ;), rule (7a) [and consequently the first 1
- sentence of rule (1)] would apply. In many instances the two 1
- conditions are identical, but this part of rule (1) does not give 1
- license to treating a WORD as a reserved words unless it is in a 1
- place where a reserved word must appear. 1
-
- - The statement is equivalent to requiring that when the LR(1) 2
- lookahead set contains exactly a reserved word, it must be 2
- recognized if it is present. (Here ``LR(1)'' refers to the 2
- theoretical concepts, not to any real parser generator.) 2
-
- For example, in the construct below, and when the parser is at the 2
- point marked with ^, the only next legal token is then (this 2
- follows directly from the grammar rules). 2
-
- if if....fi then .... fi 2
- ^ 2
-
- At that point, the then must be recognized as a reserved word. 2
-
- (Depending on the actual parser generator actually used, ``extra'' 2
- reserved words may be in some lookahead sets. It does not really 2
- matter if they are recognized, or even if any possible reserved 2
- word is recognized in that state, because if it is recognized and 2
- is not in the (theoretical) LR(1) lookahead set, an error will 2
- ultimately be detected. In the example above, if some other 2
- reserved word (e.g., while) is also recognized, an error will occur 2
- later. 2
-
- This is approximately equivalent to saying that reserved words are 2
- recognized after other reserved words (because it is after a 2
- reserved word that this condition will occur), but avoids the 2
- ``except for...'' list that would be required for case, for, etc. 2
- (Reserved words are of course recognized anywhere a simple_command 2
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 3.10 Shell Grammar 287
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- can appear, as well. Other rules take care of the special cases of 2
- nonrecognition, such as rule (4) for case statements.) 2
-
- Note that the body of here-documents are handled by Token Recognition
- (see 3.3) and do not appear in the grammar directly. (However, the
- here-document I/O redirection operator is handled as part of the
- grammar.)
-
- The start symbol of the grammar (complete_command) represents either
- input from the command line or a shell script. It is repeatedly applied
- by the interpreter to its input, and represents a single ``chunk'' of
- that input as seen by the interpreter. 1
-
- The processing of here-documents is handled as part of token recognition
- (see 3.3) rather than as part of the grammar.
-
- END_RATIONALE
-
-
-
- 3.11 Signals and Error Handling
-
- When a command is in an asynchronous list, the shell shall prevent
- SIGQUIT and SIGINT signals from the keyboard from interrupting the
- command. Otherwise, signals shall have the values inherited by the shell
- from its parent (see also 3.14.13).
-
- When a signal for which a trap has been set is received while the shell 1
- is waiting for the completion of a utility executing a foreground 1
- command, the trap associated with that signal shall not be executed until 1
- after the foreground command has completed. When the shell is waiting, 1
- by means of the wait utility, for asynchronous commands to complete, the 1
- reception of a signal for which a trap has been set shall cause the wait 1
- utility to return immediately with an exit status >128, immediately after 1
- which the trap associated with that signal shall be taken. 1
-
- If multiple signals are pending for the shell for which there are
- associated trap actions (see 3.14.13), the order of execution of trap
- actions is unspecified.
-
-
-
-
-
-
-
-
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 288 3 Shell Command Language
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- 3.12 Shell Execution Environment
-
- A shell execution environment consists of the following:
-
- - Open files inherited upon invocation of the shell, plus open files
- controlled by exec.
-
- - Working Directory as set by cd (see 4.5).
-
- - File Creation Mask set by umask (see 4.67).
-
- - Current traps set by trap (see 3.14.13).
-
- - Shell parameters that are set by variable assignment (see set in
- 3.14.11) or from the POSIX.1 {8} environment inherited by the shell
- when it begins (see export in 3.14.8).
-
- - Shell functions (see 3.9.5.)
-
- - Options turned on at invocation or by set.
-
- - Process IDs of the last commands in asynchronous lists known to 1
- this shell environment; see 3.9.3.1. 1
-
- Utilities other than the special built-ins (see 3.14) shall be invoked in
- a separate environment that consists of the following. The initial value
- of these objects shall be the same as that for the parent shell, except
- as noted below.
-
- - Open files inherited on invocation of the shell, open files
- controlled by the exec special built-in (see 3.14.6), plus any
- modifications and additions specified by any redirections to the
- utility.
-
- - Current working directory.
-
- - File creation mask.
-
- - If the utility is a shell script, traps caught by the shell shall
- be set to the default values and traps ignored by the shell shall
- be set to be ignored by the utility. If the utility is not a shell
- script, the trap actions (default or ignore) shall be mapped into
- the appropriate signal handling actions for the utility.
-
- - Variables with the export attribute, along with those explicitly
- exported for the duration of the command, shall be passed to the
- utility as POSIX.1 {8} environment variables.
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 3.12 Shell Execution Environment 289
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- The environment of the shell process shall not be changed by the utility
- unless explicitly specified by the utility description (for example, cd
- and umask).
-
- A subshell environment shall be created as a duplicate of the shell
- environment, except that signal traps set by that shell environment shall 1
- be set to the default values. Changes made to the subshell environment 1
- shall not affect the shell environment. Command substitution, commands
- that are grouped with parentheses, and asynchronous lists shall be
- executed in a subshell environment. Additionally, each command of a
- multicommand pipeline is in a subshell environment; as an extension,
- however, any or all commands in a pipeline may be executed in the current
- environment. All other commands shall be executed in the current shell
- environment.
-
- BEGIN_RATIONALE
-
-
- 3.12.0.1 Shell Execution Environment Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a
- _p_a_r_t _o_f _P_1_0_0_3._2)
-
- Some systems have implemented the last stage of a pipeline in the current
- environment so that commands such as
-
- _c_o_m_m_a_n_d | read foo
-
- set variable foo in the current environment. It was decided to allow
- this extension, but not require it; therefore, a shell programmer should
- consider a pipeline to be in a subshell environment, but not depend on
- it.
-
- The previous description of execution environment failed to mention that
- each command in a multiple command pipeline could be in a subshell
- execution environment. For compatibility with some existing shells, the
- wording was phrased to allow an implementation to place any or all
- commands of a pipeline in the current environment. However, this means
- that a POSIX application must assume each command is in a subshell
- environment, but not depend on it.
-
- The wording about shell scripts is meant to convey the fact that
- describing ``trap actions'' can only be understood in the context of the
- shell command language. Outside this context, such as in a C-language
- program, signals are the operative condition, not traps.
-
- END_RATIONALE
-
-
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 290 3 Shell Command Language
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- 3.13 Pattern Matching Notation
-
- The pattern matching notation described in this clause is used to specify
- patterns for matching strings in the shell. Historically, pattern
- matching notation is related to, but slightly different from, the regular
- expression notation described in 2.8. For this reason, the description
- of the rules for this pattern matching notation are based on the
- description of regular expression notation.
-
- BEGIN_RATIONALE
-
-
- 3.13.0.1 Pattern Matching Notation Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a
- _p_a_r_t _o_f _P_1_0_0_3._2)
-
- Pattern matching is a simpler concept and has a simpler syntax than
- regular expressions, as the former is generally used for the manipulation
- of file names, which are relatively simple collections of characters,
- while the latter is generally used to manipulate arbitrary text strings
- of potentially greater complexity. However, some of the basic concepts
- are the same, so this clause points liberally to the detailed
- descriptions in 2.8.
-
- END_RATIONALE
-
-
- 3.13.1 Patterns Matching a Single Character
-
- The following _p_a_t_t_e_r_n_s _m_a_t_c_h_i_n_g _a _s_i_n_g_l_e-_c_h_a_r_a_c_t_e_r match a single
- character: _o_r_d_i_n_a_r_y _c_h_a_r_a_c_t_e_r_s, _s_p_e_c_i_a_l _p_a_t_t_e_r_n _c_h_a_r_a_c_t_e_r_s, and _p_a_t_t_e_r_n
- _b_r_a_c_k_e_t _e_x_p_r_e_s_s_i_o_n_s. The pattern bracket expression also shall match a
- single collating element.
-
- An ordinary character is a pattern that shall match itself. It can be
- any character in the supported character set except for NUL, those 1
- special shell characters in 3.2 that require quoting, and the following 1
- three special pattern characters. Matching shall be based on the bit 1
- pattern used for encoding the character, not on the graphic 1
- representation of the character. If any character (ordinary, shell 1
- special, or pattern special) is quoted, that pattern shall match the 1
- character itself. The shell special characters always require quoting. 1
-
- When unquoted and outside a bracket expression, the following three 1
- characters shall have special meaning in the specification of patterns: 1
-
- ? A question-mark is a pattern that shall match any character.
-
- * An asterisk is a pattern that shall match multiple characters,
- as described in 3.13.2.
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 3.13 Pattern Matching Notation 291
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- [ The open bracket shall introduce a pattern bracket expression.
-
- The description of basic regular expression bracket expressions in
- 2.8.3.2 also shall apply to the pattern bracket expression, except that
- the exclamation-mark character (!) shall replace the circumflex character
- (^) in its role in a _n_o_n_m_a_t_c_h_i_n_g _l_i_s_t in the regular expression notation.
- A bracket expression starting with an unquoted circumflex character
- produces unspecified results.
-
- When pattern matching is used where shell quote removal is not performed 1
- [such as in the argument to the find -name primary when find is being 1
- called using an _e_x_e_c function, or in the _p_a_t_t_e_r_n argument to the 1
- _f_n_m_a_t_c_h() function], special characters can be escaped to remove their 1
- special meaning by preceding them with a <backslash>. This escaping 1
- <backslash> shall be discarded. The sequence \\ shall represent one 1
- literal backslash. All of the requirements and effects of quoting on 1
- ordinary, shell special, and special pattern characters shall apply to 1
- escaping in this context. 1
-
- BEGIN_RATIONALE 1
-
-
- 3.13.1.1 Patterns Matching a Single Character Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e
- _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2)
-
- Both ``quoting'' and ``escaping'' are described here because pattern 1
- matching must work in three separate circumstances: 1
-
- - Calling directly upon the shell, such as in pathname expansion or 1
- in a case statement. All of the following will match the string or 1
- file abc: abc, "abc", a"b"c, a\bc, a[b]c, a["b"]c, a[\b]c, a?c, 1
- a*c. The following will not: "a?c", a\*c, a\[b]c, a["\b"]c. 1
-
- - Calling a utility or function without going through a shell, as 1
- described for find and _f_n_m_a_t_c_h(). 1
-
- - Calling utilities such as find or pax through the shell command 1
- line. (Although find and pax are the only instances of this in the 1
- standard utilities, describing it globally here is useful for 1
- future utilities that may use pattern matching internally.) In 1
- this case, shell quote removal is performed before the utility sees 1
- the argument. For example, in 1
-
- find /bin -name "e\c[\h]o" -print 1
-
- after quote removal, the backslashes are presented to find and it 1
- treats them as escape characters. Both precede ordinary 1
- characters, so the c and h represent themselves and echo would be 1
- found on many historical systems (that have it in /bin). To find a 1
- filename that contained shell special characters or pattern 1
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 292 3 Shell Command Language
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- characters, both quoting and escaping are required, such as 1
-
- pax -r ... "*a\(\?" 1
-
- to extract a filename ending with ``a(?''. 1
-
- Conforming applications are required to quote or escape the shell special 1
- characters (called ``metacharacters'' in some historical documentation). 1
- If used without this protection, syntax errors can result or 1
- implementation extensions can be triggered. For example, the KornShell 1
- supports a series of extensions based on parentheses in patterns. 1
-
- The restriction on circumflex in a bracket expression is to allow
- implementations that support pattern matching using circumflex as the
- negation character in addition to the exclamation-mark. 1
-
- END_RATIONALE 1
-
-
- 3.13.2 Patterns Matching Multiple Characters
-
- The following rules are used to construct _p_a_t_t_e_r_n_s _m_a_t_c_h_i_n_g _m_u_l_t_i_p_l_e
- _c_h_a_r_a_c_t_e_r_s from _p_a_t_t_e_r_n_s _m_a_t_c_h_i_n_g _a _s_i_n_g_l_e _c_h_a_r_a_c_t_e_r:
-
- (1) The asterisk (*) is a pattern that shall match any string,
- including the null string.
-
- (2) The concatenation of _p_a_t_t_e_r_n_s _m_a_t_c_h_i_n_g _a _s_i_n_g_l_e _c_h_a_r_a_c_t_e_r is a
- valid pattern that shall match the concatenation of the single
- characters or collating elements matched by each of the
- concatenated patterns.
-
- (3) The concatenation of one or more _p_a_t_t_e_r_n_s _m_a_t_c_h_i_n_g _a _s_i_n_g_l_e
- _c_h_a_r_a_c_t_e_r with one or more asterisks is a valid pattern. In
- such patterns, each asterisk shall match a string of zero or
- more characters, matching the greatest possible number of
- characters that still allows the remainder of the pattern to
- match the string.
-
- BEGIN_RATIONALE
-
-
- 3.13.2.1 Patterns Matching Multiple Characters Rationale. (_T_h_i_s
- _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2)
-
- Since each asterisk matches ``zero or more'' occurrences, the patterns
- a*b and a**b have identical functionality.
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 3.13 Pattern Matching Notation 293
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- _E_x_a_m_p_l_e_s:
-
- a[bc] matches the strings ab and ac.
-
- a*d matches the strings ad, abd, and abcd, but not the string
- abc.
-
- a*d* matches the strings ad, abcd, abcdef, aaaad, and adddd;
-
- *a*d matches the strings ad, abcd, efabcd, aaaad, and adddd.
-
- END_RATIONALE
-
-
- 3.13.3 Patterns Used for Filename Expansion
-
- The rules described so far in 3.13.1 and 3.13.2 are qualified by the
- following rules that apply when pattern matching notation is used for
- filename expansion.
-
- (1) The slash character in a pathname shall be explicitly matched by
- using one or more slashes in the pattern; it cannot be matched
- by the asterisk or question-mark special characters or by a
- bracket expression. Slashes in the pattern are identified
- before bracket expressions; thus, a slash cannot be included in
- a pattern bracket expression used for filename expansion.
-
- (2) If a filename begins with a period (.), the period shall be
- explicitly matched by using a period as the first character of
- the pattern or immediately following a slash character. The
- leading period shall not be matched by:
-
- - The asterisk or question-mark special characters, or
-
- - A bracket expression containing a nonmatching list (such as
- [!a]), a range expression (such as [%-0]), or a character
- class expression (such as [[:punct:]]).
-
- It is unspecified whether an explicit period in a bracket
- expression matching list (such as [.abc]) can match a leading
- period in a filename.
-
- (3) Specified patterns are matched against existing filenames and
- pathnames, as appropriate. Each component that contains a 2
- pattern character requires read permission in the directory 2
- containing that component. Any component that does not contain 2
- a pattern character requires search permission. For example, 2
- given the pattern 2
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 294 3 Shell Command Language
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- /foo/bar/x*/bam 2
-
- search permission is needed for directory /foo, search and read 2
- permissions are needed for directory bar, and search permission 2
- is needed for each x* directory. If the pattern matches any 2
- existing filenames or pathnames, the pattern shall be replaced
- with those filenames and pathnames, sorted according to the
- collating sequence in effect in the current locale. If the
- pattern contains an invalid bracket expression or does not match
- any existing filenames or pathnames, the pattern string shall be
- left unchanged.
-
- BEGIN_RATIONALE
-
-
- 3.13.3.1 Patterns Used for File Name Expansion Rationale. (_T_h_i_s
- _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2)
-
- The caveat about a slash within a bracket expression is derived from
- historical practice. The pattern a[b/c]d will not match such pathnames
- as abd or a/d. It will only match a pathname of literally a[b/c]d.
-
- Filenames beginning with a period historically have been specially
- protected from view on UNIX systems. A proposal to allow an explicit
- period in a bracket expression to match a leading period was considered;
- it is allowed as an implementation extension, but a conforming
- application cannot make use of it. If this extension becomes popular in
- the future, it will be considered for a future version of POSIX.2.
-
- Historical systems have varied in their permissions requirements. To 2
- match f*/bar has required read permissions on the f* directories in the 2
- System V shell, but this standard, the C-shell, and KornShell require 2
- only search permissions. 2
-
- END_RATIONALE 2
-
-
- 3.14 Special Built-in Utilities
-
- The following _s_p_e_c_i_a_l _b_u_i_l_t-_i_n utilities shall be supported in the shell
- command language. The output of each command, if any, shall be written
- to standard output, subject to the normal redirection and piping possible
- with all commands.
-
- The term _b_u_i_l_t-_i_n implies that the shell can execute the utility directly
- and does not need to search for it. An implementation can choose to make
- any utility a built-in; however, the special built-in utilities described
- here differ from regular built-in utilities in two respects:
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 3.14 Special Built-in Utilities 295
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- (1) A syntax error in a special built-in utility may cause a shell
- executing that utility to abort, while a syntax error in a
- regular built-in utility shall not cause a shell executing that
- utility to abort. (See 3.8.1 for the consequences of errors on
- interactive and noninteractive shells.) If a special built-in
- utility encountering a syntax error does not abort the shell,
- its exit value shall be nonzero.
-
- (2) Variable assignments specified with special built-in utilities
- shall remain in effect after the built-in completes; this shall 1
- not be the case with a regular built-in or other utility. 1
-
- As described in 2.3, the special built-in utilities in this clause need
- not be provided in a manner accessible via the POSIX.1 {8} _e_x_e_c family of
- functions.
-
- Some of the special built-ins are described as conforming to the utility
- argument syntax guidelines in 2.10.2. For those that are not, the
- requirement in 2.11.3 that "--" be recognized as a first argument to be
- discarded does not apply and a conforming application shall not use that
- argument.
-
-
- 3.14.1 break - Exit from for, while, or until loop
-
- break [_n]
-
- Exit from the smallest enclosing for, while, or until loop, if any; or
- from the _nth enclosing loop if _n is specified. The value of _n is an 1
- unsigned decimal integer _> 1. The default is equivalent to _n=1. If _n is
- greater than the number of enclosing loops, the last enclosing loop shall
- be exited from. Execution continues with the command immediately
- following the loop.
-
- _E_x_i_t__S_t_a_t_u_s
-
- 0 Successful completion. 2
-
- >0 The _n value was not an unsigned decimal integer _> 1. 2
-
- BEGIN_RATIONALE
-
-
- 3.14.1.1 break Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2)
-
- Example:
-
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 296 3 Shell Command Language
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- for i in *
- do
- if test -d "$i"
- then break
- fi
- done
-
- Consideration was given to expanding the syntax of the break and continue
- to refer to a label associated with the appropriate loop, as a preferable
- alternative to the [_n] method. This new method was proposed late in the
- development of the standard and adequate consensus could not be formed to
- include it. However, POSIX.2 does reserve the namespace of command names
- ending with a colon. It is anticipated that a future implementation
- could take advantage of this and provide something like:
-
- outofloop: for i in a b c d e 1
- do
- for j in 0 1 2 3 4 5 6 7 8 9
- do
- if test -r "${i}${j}"
- then break outofloop
- fi
- done
- done
-
- and that this might be standardized after implementation experience is
- achieved.
-
- END_RATIONALE
-
-
- 3.14.2 colon - Null utility
-
- : [_a_r_g_u_m_e_n_t ...]
-
- This utility shall only expand command _a_r_g_u_m_e_n_ts.
-
- _E_x_i_t__S_t_a_t_u_s
-
- Zero.
-
- BEGIN_RATIONALE
-
-
-
-
-
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 3.14 Special Built-in Utilities 297
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- 3.14.2.1 colon Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2)
-
- The colon (:), or null utility, is used when a command is needed, as in
- the then condition of an if command, but nothing is to be done by the
- command.
-
- Example:
-
- : ${X=abc}
- if false
- then :
- else echo $X
- fi
- abc
-
- As with any of the special built-ins, the null utility can also have
- variable assignments and redirections associated with it, such as:
-
- x=y : > z
-
- which sets variable x to the value y (so that it persists after the null
- utility ``completes'') and creates or truncates file z.
-
- END_RATIONALE
-
-
- 3.14.3 continue - Continue for, while, or until loop
-
- continue [_n]
-
- The continue utility shall return to the top of the smallest enclosing
- for, while, or until, loop, or to the top of the _nth enclosing loop, if _n
- is specified. This involves repeating the condition list of a while or
- until loop or performing the next assignment of a for loop, and
- reexecuting the loop if appropriate.
-
- The value of _n is a decimal integer _> 1. The default is equivalent to
- _n=1. If _n is greater than the number of enclosing loops, the last
- enclosing loop is used.
-
- _E_x_i_t__S_t_a_t_u_s
-
- 0 Successful completion. 2
-
- >0 The _n value was not an unsigned decimal integer _> 1. 2
-
- BEGIN_RATIONALE
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 298 3 Shell Command Language
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- 3.14.3.1 continue Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2)
-
- Example:
-
- for i in *
- do
- if test -d "$i"
- then continue
- fi
- done
-
- END_RATIONALE
-
-
- 3.14.4 dot - Execute commands in current environment
-
- . _f_i_l_e
-
- The shell shall execute commands from the _f_i_l_e in the current
- environment.
-
- If _f_i_l_e does not contain a slash, the shell shall use the search path
- specified by PATH to find the directory containing _f_i_l_e. Unlike normal
- command search, however, the file searched for by the dot utility need
- not be executable. If no readable file is found, a noninteractive shell
- shall abort; an interactive shell shall write a diagnostic message to
- standard error, but this condition shall not be considered a syntax
- error.
-
- _E_x_i_t__S_t_a_t_u_s
-
- Returns the value of the last command executed, or a zero exit status if
- no command is executed.
-
- BEGIN_RATIONALE
-
-
- 3.14.4.1 dot Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2)
-
- Some older implementations searched the current directory for the _f_i_l_e,
- even if the value of PATH disallowed it. This behavior was omitted from
- POSIX.2 due to concerns about introducing the susceptibility to trojan
- horses that the user might be trying to avoid by leaving dot out of PATH.
-
- The KornShell version of dot takes optional arguments that are set to the 1
- positional parameters. This is a valid extension that allows a dot 1
- script to behave identically to a function.
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 3.14 Special Built-in Utilities 299
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- Example:
-
- cat foobar
- foo=hello bar=world
- . foobar
- echo $foo $bar
- hello world
-
- END_RATIONALE
-
-
- 3.14.5 eval - Construct command by concatenating arguments
-
- eval [_a_r_g_u_m_e_n_t ...]
-
- The eval utility shall construct a command by concatenating _a_r_g_u_m_e_n_ts
- together, separating each with a <space>. The constructed command shall
- be read and executed by the shell.
-
- _E_x_i_t__S_t_a_t_u_s
-
- If there are no _a_r_g_u_m_e_n_ts, or only null arguments, eval shall return a
- zero exit status; otherwise, it shall return the exit status of the
- command defined by the string of concatenated _a_r_g_u_m_e_n_ts separated by
- spaces.
-
- BEGIN_RATIONALE
-
-
- 3.14.5.1 eval Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2)
-
- Example:
-
- foo=10 x=foo
- y='$'$x
- echo $y
- $foo
- eval y='$'$x
- echo $y
- 10
-
- END_RATIONALE
-
-
-
-
-
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 300 3 Shell Command Language
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- 3.14.6 exec - Execute commands and open, close, and/or copy file
- descriptors
-
- exec [_c_o_m_m_a_n_d [_a_r_g_u_m_e_n_t ...]]
-
- The exec utility opens, closes, and/or copies file descriptors as
- specified by any redirections as part of the command.
-
- If exec is specified without _c_o_m_m_a_n_d or _a_r_g_u_m_e_n_t_s, and any file
- descriptors with numbers > 2 are opened with associated redirection
- statements, it is unspecified whether those file descriptors remain open
- when the shell invokes another utility.
-
- If exec is specified with _c_o_m_m_a_n_d, it shall replace the shell with
- _c_o_m_m_a_n_d without creating a new process. If _a_r_g_u_m_e_n_ts are specified, they
- are arguments to _c_o_m_m_a_n_d. Redirection shall affect the current shell
- execution environment.
-
- _E_x_i_t__S_t_a_t_u_s
-
- If _c_o_m_m_a_n_d is specified, exec shall not return to the shell; rather, the 2
- exit status of the process shall be the exit status of the program 2
- implementing _c_o_m_m_a_n_d, which overlaid the shell. If _c_o_m_m_a_n_d is not found, 2
- the exit status shall be 127. If _c_o_m_m_a_n_d is found, but it is not an 1
- executable utility, the exit status shall be 126. If a redirection error 1
- occurs (see 3.8.1), the shell shall exit with a value in the range 1-125. 1
- Otherwise, exec shall return a zero exit status.
-
- BEGIN_RATIONALE
-
-
- 3.14.6.1 exec Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2)
-
- Most historical implementations are not conformant in that
-
- foo=bar exec cmd
-
- does not pass foo to cmd.
-
- Earlier drafts stated that ``If specified without _c_o_m_m_a_n_d or _a_r_g_u_m_e_n_t,
- the shell sets to close-on-exec file numbers greater than 2 that are
- opened in this way, so that they will be closed when the shell invokes
- another program.'' This was based on the behavior of one version of the
- KornShell and was made unspecified when it was realized that some
- existing scripts relied on the more generally historical behavior
- (leaving all file descriptors open). Furthermore, since the application
- should have no cognizance of whether a new shell is simply _f_o_r_k()ed,
- rather than _e_x_e_c()ed, it could not consistently rely on the automatic
- closing behavior anyway. Scripts concerned that child shells could
- misuse open file descriptors can always close them explicitly, as shown
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 3.14 Special Built-in Utilities 301
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- in one of the following examples.
-
- Examples:
-
- Open readfile as file descriptor 3 for reading:
-
- exec 3< readfile
-
- Open writefile as file descriptor 4 for writing:
-
- exec 4> writefile
-
- Make unit 5 a copy of unit 0:
-
- exec 5<&0
-
- Close file unit 3:
-
- exec 3<&-
-
- Cat the file maggie by replacing the current shell with the cat utility:
-
- exec cat maggie
-
- END_RATIONALE
-
-
- 3.14.7 exit - Cause the shell to exit
-
- exit [_n]
-
- The exit utility shall cause the shell to exit with the exit status
- specified by the unsigned decimal integer _n. If _n is specified, but its 1
- value is not between 0 and 255 inclusively, the exit status is undefined. 1
-
- A trap on EXIT shall be executed before the shell terminates, except when
- the exit utility is invoked in that trap itself, in which case the shell
- shall exit immediately.
-
- _E_x_i_t__S_t_a_t_u_s
-
- The exit status shall be _n, if specified. Otherwise, the value shall be
- the exit value of the last command executed, or zero if no command was
- executed. When exit is executed in a trap action (see 3.14.13), the
- ``last command'' is considered to be the command that executed
- immediately preceding the trap action.
-
- BEGIN_RATIONALE
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 302 3 Shell Command Language
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- 3.14.7.1 exit Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2)
-
- As explained in other clauses, certain exit status values have been 1
- reserved for special uses and should be used by applications only for 1
- those purposes: 1
-
- 126 A file to be executed was found, but it was not an executable 1
- utility. 1
-
- 127 A utility to be executed was not found. 1
-
- >128 A command was interrupted by a signal. 1
-
- Examples:
-
- Exit with a _t_r_u_e value:
-
- exit 0
-
- Exit with a _f_a_l_s_e value:
-
- exit 1
-
- END_RATIONALE
-
-
- 3.14.8 export - Set export attribute for variables
-
- export _n_a_m_e[=_w_o_r_d]...
- export -p
-
- The shell shall give the export attribute to the variables corresponding
- to the specified _n_a_m_es, which shall cause them to be in the environment
- of subsequently executed commands.
-
- When -p is specified, export shall write to the standard output the names
- and values of all exported variables, in the following format: 1
-
- "export %s=%s\n", <_n_a_m_e>, <_v_a_l_u_e>
-
- The shell shall format the output, including the proper use of quoting,
- so that it is suitable for re-input to the shell as commands that achieve
- the same exporting results.
-
- The export special built-in shall conform to the utility argument syntax
- guidelines described in 2.10.2.
-
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 3.14 Special Built-in Utilities 303
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- _E_x_i_t__S_t_a_t_u_s
-
- Zero.
-
- BEGIN_RATIONALE
-
-
- 3.14.8.1 export Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2)
-
- When no arguments are given, the results are unspecified. Some
- historical shells use the no-argument case as the functional equivalent
- of what is required here with -p. This feature was left unspecified
- because it is not existing practice in all shells and some scripts may
- rely on the now-unspecified results on their implementations. Attempts
- to specify the -p output as the default case were unsuccessful in
- achieving consensus. The -p option was added to allow portable access to
- the values that can be saved and then later restored using, for instance,
- a dot script.
-
- Examples:
-
- Export PWD and HOME variables:
-
- export PWD HOME
-
- Set and export the PATH variable:
-
- export PATH=/local/bin:$PATH
-
- Save and restore all exported variables:
-
- export -p > _t_e_m_p-_f_i_l_e
- unset _a _l_o_t _o_f _v_a_r_i_a_b_l_e_s
- ... _p_r_o_c_e_s_s_i_n_g
- . _t_e_m_p-_f_i_l_e
-
- END_RATIONALE
-
-
- 3.14.9 readonly - Set read-only attribute for variables 1
-
- readonly _n_a_m_e[=_w_o_r_d]...
- readonly -p
-
- The variables whose _n_a_m_es are specified shall be given the readonly
- attribute. The values of variables with the read-only attribute cannot
- be changed by subsequent assignment, nor can those variables be unset by
- the unset utility.
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 304 3 Shell Command Language
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- When -p is specified, readonly shall write to the standard output the
- names and values of all read-only variables, in the following format: 1
-
- "readonly %s=%s\n", <_n_a_m_e>, <_v_a_l_u_e>
-
- The shell shall format the output, including the proper use of quoting,
- so that it is suitable for re-input to the shell as commands that achieve
- the same attribute-setting results.
-
- The readonly special built-in shall conform to the utility argument
- syntax guidelines described in 2.10.2.
-
- _E_x_i_t__S_t_a_t_u_s
-
- Zero.
-
- BEGIN_RATIONALE
-
-
- 3.14.9.1 readonly Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2)
-
- Example:
-
- readonly HOME PWD
-
- Some versions of the shell exist that preserve the read-only attribute
- across separate invocations. POSIX.2 allows this behavior, but does not
- require it.
-
- See the rationale for export (3.14.8.1) for a description of the no-
- argument and -p output cases.
-
- In a previous draft, read-only functions were considered, but they were
- omitted as not being existing practice or particularly useful.
- Furthermore, functions must not be readonly across invocations to
- preclude _s_p_o_o_f_i_n_g (spoofing is the term for the practice of creating a
- program that acts like a well-known utility with the intent of subverting
- the user's real intent) of administrative or security-relevant (or
- -conscious) shell scripts.
-
- END_RATIONALE
-
-
- 3.14.10 return - Return from a function
-
- return [_n]
-
- The return utility shall cause the shell to stop executing the current
- function or dot script (see 3.14.4). If the shell is not currently
- executing a function or dot script, the results are unspecified.
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 3.14 Special Built-in Utilities 305
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- _E_x_i_t__S_t_a_t_u_s
-
- The value of the special parameter ? shall be set to _n, an unsigned
- decimal integer, or to the exit status of the last command executed if _n
- is not specified. If the value of _n is greater than 255, the results are
- undefined. When return is executed in a trap action (see 3.14.13), the
- ``last command'' is considered to be the command that executed
- immediately preceding the trap action.
-
- BEGIN_RATIONALE
-
-
- 3.14.10.1 return Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2)
-
- The behavior of return when not in a function or dot script differs
- between the System V shell and the KornShell. In the System V shell this
- is an error, whereas in the KornShell, the effect is the same as exit.
-
- The results of returning a number greater than 255 are undefined because
- of differing practices in the various historical implementations. Some
- shells AND out all but the low order 8 bits; others allow larger values,
- but not of unlimited size.
-
- See the discussion of appropriate exit status values in 3.14.7.1. 1
-
- END_RATIONALE 1
-
-
- 3.14.11 set - Set/unset options and positional parameters
-
- set [-aCefnuvx] [_a_r_g_u_m_e_n_t ...]
- set [+aCefnuvx] [_a_r_g_u_m_e_n_t ...]
- set -- [_a_r_g_u_m_e_n_t ...]
-
- _O_b_s_o_l_e_s_c_e_n_t _v_e_r_s_i_o_n:
-
- set - [_a_r_g_u_m_e_n_t ...]
-
- If no options or _a_r_g_u_m_e_n_ts are specified, set shall write the names and
- values of all shell variables in the collation sequence of the current
- locale. Each _n_a_m_e shall start on a separate line, using the format:
-
- "%s=%s\n", <_n_a_m_e>, <_v_a_l_u_e>
-
- The _v_a_l_u_e string shall be written with appropriate quoting so that it is
- suitable for re-input to the shell, (re)setting, as far as possible, the 1
- variables that are currently set. Readonly variables cannot be reset. 1
- See the description of shell quoting in 3.2.
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 306 3 Shell Command Language
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- When options are specified, they shall set or unset attributes of the
- shell, as described below. When _a_r_g_u_m_e_n_ts are specified, they shall
- cause positional parameters to be set or unset, as described below.
- Setting/unsetting attributes and positional parameters are not
- necessarily related actions, but they can be combined in a single
- invocation of set.
-
- The set utility shall conform to the utility argument syntax guidelines
- described in 2.10.2, except that options can be specified with either a
- leading hyphen (meaning enable the option) or plus-sign (meaning disable
- it).
-
- The implementation shall support the options in the following list in
- both their hyphen and plus-sign forms. These options can also be
- specified as options to sh; see 4.56.
-
- -a When this option is on, the export attribute shall be set
- for each variable to which an assignment is performed.
- (See 3.1.15.) If the assignment precedes a utility name
- in a command, the export attributes shall not persist in 1
- the current execution environment after the utility 1
- completes, with the exception that preceding one of the 1
- special built-in utilities shall cause the export
- attribute to persist after the built-in has completed. If
- the assignment does not precede a utility name in the
- command, or if the assignment is a result of the operation
- of the getopts or read utilities (see 4.27 and 4.52), the
- export attribute shall persist until the variable is
- unset.
-
- -C (Uppercase C.) Prevent existing files from being
- overwritten by the shell's > redirection operator (see
- 3.7.2); the >| redirection operator shall override this
- ``noclobber'' option for an individual file.
-
- -e When this option is on, if a simple command fails for any 1
- of the reasons listed in 3.8.1 or returns an exit status 1
- value >0, and is not part of the compound list following a 1
- while, until, or if keyword, and is not a part of an AND 1
- or OR list, and is not a pipeline preceded by the !
- reserved word, then the shell immediately shall exit.
-
- -f The shell shall disable pathname expansion.
-
- -n The shell shall read commands but not execute them; this
- can be used to check for shell script syntax errors. An
- interactive shell may ignore this option.
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 3.14 Special Built-in Utilities 307
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- -u The shell shall write a message to standard error when it
- tries to expand a variable that is not set and immediately
- exit. An interactive shell shall not exit.
-
- -v The shell shall write its input to standard error as it is
- read.
-
- -x The shell shall write to standard error a trace for each
- command after it expands the command and before it
- executes it.
-
- The default for all these options is off (unset) unless the shell was
- invoked with them on (see sh in 4.56). All the positional parameters
- shall be unset before any new values are assigned.
-
- The remaining arguments shall be assigned in order to the positional
- parameters. The special parameter # shall be set to reflect the number
- of positional parameters.
-
- The special argument "--" immediately following the set command name can
- be used to delimit the arguments if the first argument begins with + or
- -, or to prevent inadvertent listing of all shell variables when there
- are no arguments. The command set -- without _a_r_g_u_m_e_n_ts shall unset all
- positional parameters and set the special parameter # to zero.
-
- In the obsolescent version, the set command name followed by - with no
- other arguments shall turn off the -v and -x options without changing the
- positional parameters. The set command name followed by - with other
- arguments shall turn off the -v and -x options and assign the arguments
- to the positional parameters in order.
-
- _E_x_i_t__S_t_a_t_u_s
-
- Zero.
-
- BEGIN_RATIONALE
-
-
- 3.14.11.1 set Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2)
-
- The set -- form is listed specifically in the Synopsis even though this
- usage is implied by the utility syntax guidelines. The explanation of
- this feature removes any ambiguity about whether the set -- form might be
- misinterpreted as being equivalent to set without any options or
- arguments. The functionality of this form has been adopted from the
- KornShell. In System V, set -- only unsets parameters if there is at
- least one argument; the only way to unset all parameters is to use shift.
- Using the KornShell version should not affect System V scripts because
- there should be no reason to deliberately issue it without arguments; if
- it were issued as, say:
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 308 3 Shell Command Language
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- set -- "$@" 1
-
- and there were in fact no arguments resulting from $@, unsetting the 1
- parameters would be a no-op anyway.
-
- The set + form in earlier drafts was omitted as being an unnecessary
- duplication of set alone and not widespread historical practice.
-
- The noclobber option was changed to -C from the set -o noclobber option
- in previous drafts. The set -o is used in the KornShell to accept word-
- length option names, duplicating many of the single-letter names. The
- noclobber option was changed to a single letter so that the historical $-
- paradigm would not be broken; see 3.5.2.
-
- The following set flags were intentionally omitted with the following
- rationale:
-
- -h This flag is related to command name hashing, which is not
- required for an implementation. It is primarily a performance
- issue, which is outside the scope of this standard.
-
- -k The -k flag was originally added by Bourne to make it easier for
- users of prerelease versions of the shell. In early versions of
- the Bourne shell the construct set name=value, had to be used to
- assign values to shell variables. The problem with -k is that
- the behavior affects parsing, virtually precluding writing any
- compilers. To explain the behavior of -k, it is necessary to
- describe the parsing algorithm, which is implementation defined.
- For example,
-
- set -k; echo name=value
-
- and
-
- set -k
- echo name=value
-
- behave differently. The interaction with functions is even more
- complex. What is more, the -k flag is never needed, since the
- command line could have been reordered.
-
- -t The -t flag is hard to specify and almost never used. The only
- known use could be done with here-documents. Moreover, the
- behavior with ksh and sh differ. The man page says that it
- exits after reading and executing one command. What is one
- command? If the input is date;date, sh executes both date
- commands, ksh does only the first.
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 3.14 Special Built-in Utilities 309
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- Consideration was given to rewriting set to simplify its confusing
- syntax. A specific suggestion was that the unset utility should be used
- to unset options instead of using the non-_g_e_t_o_p_t()-able +_o_p_t_i_o_n syntax.
- However, the conclusion was reached that people were satisfied with the
- existing practice of using +_o_p_t_i_o_n and there was no compelling reason to
- modify such widespread existing practice.
-
- Examples:
-
- Write out all variables and their values:
-
- set
-
- Set $1, $2, and $3 and set $# to 3:
-
- set c a b
-
- Turn on the -x and -v options:
-
- set -xv
-
- Unset all positional parameters:
-
- set --
-
- Set $1 to the value of x, even if x begins with - or +:
-
- set -- "$x"
-
- Set the positional parameters to the expansion of x, even if x expands
- with a leading - or +:
-
- set -- $x
-
- END_RATIONALE
-
-
- 3.14.12 shift - Shift positional parameters
-
- shift [_n]
-
- The positional parameters shall be shifted. Positional parameter 1 shall
- be assigned the value of parameter (1+_n), parameter 2 shall be assigned
- the value of parameter (2+_n), and so forth. The parameters represented
- by the numbers $# down to $#-_n+1 shall be unset, and the parameter #
- shall be updated to reflect the new number of positional parameters.
-
- The value _n shall be an unsigned decimal integer less than or equal to
- the value of the special parameter #. If _n is not given, it shall be
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 310 3 Shell Command Language
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- assumed to be 1. If _n is 0, the positional and special parameters shall
- not be changed.
-
- _E_x_i_t__S_t_a_t_u_s
-
- The exit status shall be >0 if _n>$#; otherwise, it shall be zero.
-
- BEGIN_RATIONALE
-
-
- 3.14.12.1 shift Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2)
-
- Example:
-
- set a b c d e
- shift 2
- echo $*
- c d e
-
- END_RATIONALE
-
-
- 3.14.13 trap - Trap signals
-
- trap [_a_c_t_i_o_n _c_o_n_d_i_t_i_o_n ...]
-
- If _a_c_t_i_o_n is -, the shell shall reset each _c_o_n_d_i_t_i_o_n to the default
- value. If _a_c_t_i_o_n is null (''), the shell shall ignore each of the
- specified _c_o_n_d_i_t_i_o_ns if they arise. Otherwise, the argument _a_c_t_i_o_n shall
- be read and executed by the shell when one of the corresponding
- conditions arises. The action of the trap shall override a previous
- action (either default action or one explicitly set). The value of $?
- after the trap action completes shall be the value it had before the trap
- was invoked.
-
- The condition can be EXIT, 0 (equivalent to EXIT), or a signal specified
- using a symbolic name, without the SIG prefix, as listed in Required 1
- Signals and Job Control Signals (Table 3-1 and Table 3-2 in POSIX.1 {8}).
- (For example: HUP, INT, QUIT, TERM). Setting a trap for SIGKILL or
- SIGSTOP produces undefined results.
-
- The environment in which the shell executes a trap on EXIT shall be
- identical to the environment immediately after the last command executed
- before the trap on EXIT was taken.
-
- Each time the trap is invoked, the _a_c_t_i_o_n argument shall be processed in
- a manner equivalent to:
-
- eval "$action"
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 3.14 Special Built-in Utilities 311
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- Signals that were ignored on entry to a noninteractive shell cannot be
- trapped or reset, although no error need be reported when attempting to
- do so. An interactive shell may reset or catch signals ignored on entry.
- Traps shall remain in place for a given shell until explicitly changed
- with another trap command.
-
- The trap command with no arguments shall write to standard output a list
- of commands associated with each condition. The format is:
-
- "trap -- %s %s ...\n", <_a_c_t_i_o_n>, <_c_o_n_d_i_t_i_o_n> ... 1
-
- The shell shall format the output, including the proper use of quoting,
- so that it is suitable for re-input to the shell as commands that achieve
- the same trapping results.
-
- An implementation may allow numeric signal numbers for the conditions as
- an extension, if and only if the following map of signal numbers to names
- is true:
-
- Signal Signal Signal Signal
- Number Name Number Name
- ______ _______ ______ _______
- 1 SIGHUP 9 SIGKILL
- 2 SIGINT 14 SIGALRM
- 3 SIGQUIT 15 SIGTERM
- 6 SIGABRT
-
- Otherwise, it shall be an error for the application to use numeric signal
- numbers.
-
- The trap special built-in shall conform to the utility argument syntax
- guidelines described in 2.10.2.
-
- _E_x_i_t__S_t_a_t_u_s
-
- If the trap name or number is invalid, a nonzero exit status shall be
- returned; otherwise, zero shall be returned. For both interactive and
- noninteractive shells, invalid signal names or numbers shall not be
- considered a syntax error and shall not cause the shell to abort.
-
- BEGIN_RATIONALE
-
- 3.14.13.1 trap Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2)
-
- Implementations may permit lowercase signal names as an extension. 1
- Implementations may also accept the names with the SIG prefix; no known 1
- historical shell does so. The trap and kill utilities in POSIX.2 are now 1
- consistent in their omission of the SIG prefix for signal names. Some 1
- kill implementations do not allow the prefix and kill -l lists the 1
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 312 3 Shell Command Language
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- signals without prefixes. 1
-
- As stated previously, when a subshell is entered, traps are set to the 1
- default actions. This does not imply that the trap command cannot be 1
- used within the subshell to set new traps. 1
-
- Trapping SIGKILL or SIGSTOP is accepted by some historical
- implementations, but it does not work. Portable POSIX.2 applications
- cannot try it.
-
- The output format is not historical practice. Since the output of
- historical traps is not portable (because numeric signal values are not
- portable) and had to change to become so, an opportunity was taken to
- format the output in a way that a shell script could use to save and then
- later reuse a trap if it wanted. For example:
-
- save_traps=$(trap)
- ...
- eval "$save_traps"
-
- The KornShell uses an ERR trap that is triggered whenever set -e would
- cause an exit. This is allowable as an extension, but was not mandated,
- as other shells have not used it.
-
- The text about the environment for the EXIT trap invalidates the behavior
- of some historical versions of interactive shells which, e.g., close the
- standard input before executing a trap on 0. For example, in some
- historical interactive shell sessions the following trap on 0 would
- always print --:
-
- trap 'read foo; echo "-$foo-"' 0
-
- Examples:
-
- Write out a list of all traps and actions:
-
- trap
-
- Set a trap so the logout utility in the HOME directory will execute when
- the shell terminates:
-
- trap '$HOME/logout' EXIT
-
- _o_r
- trap '$HOME/logout' 0
-
- Unset traps on INT, QUIT, TERM, and EXIT:
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 3.14 Special Built-in Utilities 313
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- trap - INT QUIT TERM EXIT
-
- END_RATIONALE
-
-
- 3.14.14 unset - Unset values and attributes of variables and functions
-
- unset [-fv] _n_a_m_e ... 1
-
- Each variable or function specified by _n_a_m_e shall be unset.
-
- If -v is specified, _n_a_m_e refers to a variable name and the shell shall 1
- unset it and remove it from the environment. Read-only variables cannot 1
- be unset. 1
-
- If -f is specified, _n_a_m_e refers to a function and the shell shall unset 1
- the function definition. 1
-
- If neither -f nor -v is specified, _n_a_m_e refers to a variable; if a 1
- variable by that name does not exist, it is unspecified whether a 1
- function by that name, if any, shall be unset. 1
-
- Unsetting a variable or function that was not previously set shall not be
- considered an error and shall not cause the shell to abort. 1
-
- The unset special built-in shall conform to the utility argument syntax
- guidelines described in 2.10.2.
-
- _E_x_i_t__S_t_a_t_u_s
-
- 0 All _n_a_m_es were successfully unset.
-
- >0 At least one _n_a_m_e could not be unset.
-
- BEGIN_RATIONALE
-
-
- 3.14.14.1 unset Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2)
-
- Note that
-
- VARIABLE=
-
- is not equivalent to an unset of VARIABLE; in the example, VARIABLE is
- set to "". Also, the ``variables'' that can be unset should not be
- misinterpreted to include the special parameters (see 3.5.2).
-
- Consideration was given to omitting the -f option in favor of an
- unfunction utility, but decided to retain existing practice.
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 314 3 Shell Command Language
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- The -v option was introduced because System V historically used one name 1
- space for both variables and functions. When unset is used without 1
- options, System V historically unset either a function or a variable and 1
- there was no confusion about which one was intended. A portable POSIX.2 1
- application can use unset without an option to unset a variable, but not 1
- a function; the -f option must be used. 1
-
- Examples:
-
- Unset the VISUAL variable:
-
- unset -v VISUAL 1
-
- Unset the functions foo and bar:
-
- unset -f foo bar
-
- END_RATIONALE
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 3.14 Special Built-in Utilities 315
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- P1003.2/D11.2
-
-
-
-
-
-
-
-
- Section 4: Execution Environment Utilities
-
-
-
- The Execution Environment Utilities are the utilities that shall be
- implemented in all conforming POSIX.2 systems.
-
-
-
- 4.1 awk - Pattern scanning and processing language
-
-
- 4.1.1 Synopsis
-
- awk [-F _E_R_E] [-v _a_s_s_i_g_n_m_e_n_t] ... _p_r_o_g_r_a_m [_a_r_g_u_m_e_n_t ...]
-
- awk [-F _E_R_E] -f _p_r_o_g_f_i_l_e ... [-v _a_s_s_i_g_n_m_e_n_t] ... [_a_r_g_u_m_e_n_t ...]
-
-
- 4.1.2 Description
-
- The awk utility shall execute programs written in the _a_w_k programming
- language, which is specialized for textual data manipulation. An awk
- program is a sequence of patterns and corresponding actions. When input
- is read that matches a pattern, the action associated with that pattern
- shall be carried out.
-
- Input shall be interpreted as a sequence of records. By default, a
- record is a line, but this can be changed by using the RS built-in
- variable. Each record of input shall be matched in turn against each
- pattern in the program. For each pattern matched, the associated action
- shall be executed.
-
- The awk utility shall interpret each input record as a sequence of fields
- where, by default, a field is a string of non-<blank> characters. This
- default white space field delimiter can be changed by using the FS
- built-in variable or the -F _E_R_E. The awk utility shall denote the first
- field in a record $1, the second $2, and so forth. The symbol $0 shall
- refer to the entire record; setting any other field shall cause the
- reevaluation of $0. Assigning to $0 shall reset the values of all other 1
- fields and the NF built-in variable. 1
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 4.1 awk - Pattern scanning and processing language 317
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- 4.1.3 Options
-
- The awk utility shall conform to the utility argument syntax guidelines
- described in 2.10.2.
-
- The following options shall be supported by the implementation:
-
- -F _E_R_E Define the input field separator to be the extended
- regular expression _E_R_E, before any input is read (see
- 4.1.7.4).
-
- -f _p_r_o_g_f_i_l_e Specifies the pathname of the file _p_r_o_g_f_i_l_e containing an
- awk program. If multiple instances of this option are
- specified, the concatenation of the files specified as
- _p_r_o_g_f_i_l_e in the order specified shall be the awk program.
- The awk program can alternatively be specified in the
- command line as a single argument.
-
- -v _a_s_s_i_g_n_m_e_n_t
- The _a_s_s_i_g_n_m_e_n_t argument shall be in the same form as an
- _a_s_s_i_g_n_m_e_n_t operand. The specified variable assignment
- shall occur prior to executing the awk program, including
- the actions associated with BEGIN patterns (if any).
- Multiple occurrences of this option can be specified.
-
-
- 4.1.4 Operands
-
- The following operands shall be supported by the implementation:
-
- _p_r_o_g_r_a_m If no -f option is specified, the first operand to awk
- shall be the text of the awk program. The application
- shall supply the _p_r_o_g_r_a_m operand as a single argument to
- awk. If the text does not end in a <newline> character,
- awk shall interpret the text as if it did.
-
- _a_r_g_u_m_e_n_t Either of the following two types of _a_r_g_u_m_e_n_ts can be
- intermixed:
-
- _f_i_l_e A pathname of a file that contains the input to
- be read, which is matched against the set of
- patterns in the program. If no _f_i_l_e operands are
- specified, or if a _f_i_l_e operand is -, the
- standard input shall be used.
-
- _a_s_s_i_g_n_m_e_n_t
- An operand that begins with an underscore or
- alphabetic character from the portable character
- set (see Table 2-3 in 2.4), followed by a
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 318 4 Execution Environment Utilities
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- sequence of underscores, digits, and alphabetics
- from the portable character set, followed by the
- = character shall specify a variable assignment
- rather than a pathname. The characters before
- the = shall represent the name of an awk
- variable; if that name is an awk reserved word
- (see 4.1.7.7) the behavior is undefined. The
- characters following the equals-sign shall be
- interpreted as if they appeared in the awk
- program preceded and followed by a double-quote
- (") character, as a STRING token (see 4.1.7.7),
- except that if the last character is an unescaped
- backslash, it shall be interpreted as a literal
- backslash rather than as the first character of
- the sequence ``\"''. The variable shall be
- assigned the value of that STRING token. If that
- value is considered a _n_u_m_e_r_i_c _s_t_r_i_n_g (see
- 4.1.7.2), the variable shall also be assigned its
- numeric value. Each such variable assignment
- shall occur just prior to the processing of the
- following _f_i_l_e, if any. Thus, an assignment
- before the first _f_i_l_e argument shall be executed
- after the BEGIN actions (if any), while an
- assignment after the last _f_i_l_e argument shall
- occur before the END actions (if any). If there
- are no _f_i_l_e arguments, assignments shall be
- executed before processing the standard input.
-
-
- 4.1.5 External Influences
-
-
- 4.1.5.1 Standard Input
-
- The standard input shall be used only if no _f_i_l_e operands are specified,
- or if a _f_i_l_e operand is -. See Input Files.
-
- 4.1.5.2 Input Files
-
- Input files to the awk program from any of the following sources: 1
-
- - Any _f_i_l_e operands or their equivalents, achieved by modifying the 1
- awk variables ARGV and ARGC 1
-
- - Standard input in the absence of any _f_i_l_e operands 1
-
- - Arguments to the getline function 1
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 4.1 awk - Pattern scanning and processing language 319
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- shall be text files. Whether the variable RS is set to a value other 1
- than <newline> or not, for these files, the implementation shall support 1
- records terminated with the specified separator up to {LINE_MAX} bytes 1
- and may support longer records. 1
-
- If -f _p_r_o_g_f_i_l_e is specified, the file(s) named by _p_r_o_g_f_i_l_e shall be text
- file(s) containing an awk program.
-
-
- 4.1.5.3 Environment Variables
-
- The following environment variables shall affect the execution of awk:
-
- LANG This variable shall determine the locale to use for
- the locale categories when both LC_ALL and the
- corresponding environment variable (beginning with
- LC_) do not specify a locale. See 2.6.
-
- LC_ALL This variable shall determine the locale to be used
- to override any values for locale categories
- specified by the settings of LANG or any
- environment variables beginning with LC_.
-
- LC_CTYPE This variable shall determine the locale for the
- interpretation of sequences of bytes of text data
- as characters (e.g., single- versus multibyte
- characters in arguments and input files), the
- behavior of character classes within regular
- expressions, the identification of characters as
- letters, and the mapping of upper- and lowercase
- characters for the toupper and tolower functions.
-
- LC_COLLATE This variable shall determine the locale for the
- behavior of ranges, equivalence classes, and
- multicharacter collating elements within regular
- expressions and in comparisons of string values.
-
- LC_MESSAGES This variable shall determine the language in which
- messages should be written.
-
- LC_NUMERIC This variable shall determine the radix character
- used when interpreting numeric input, performing
- conversions between numeric and string values, and
- formatting numeric output.
-
- PATH This variable shall define the search path when
- looking for commands executed by system(_e_x_p_r), or
- input and output pipes. See 2.6.
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 320 4 Execution Environment Utilities
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- In addition, all environment variables shall be visible via the awk
- variable ENVIRON.
-
-
- 4.1.5.4 Asynchronous Events
-
- Default.
-
-
- 4.1.6 External Effects
-
- 4.1.6.1 Standard Output
-
- The nature of the output files depends on the awk program.
-
-
- 4.1.6.2 Standard Error
-
- Used only for diagnostic messages.
-
- 4.1.6.3 Output Files
-
- The nature of the output files depends on the awk program.
-
-
- 4.1.7 Extended Description
-
-
- 4.1.7.1 Overall Program Structure
-
- An awk program is composed of pairs of the form:
-
- _p_a_t_t_e_r_n { _a_c_t_i_o_n }
-
- Either the pattern or the action (including the enclosing brace
- characters) can be omitted.
-
- A missing pattern shall match any record of input, and a missing action
- shall be equivalent to an action that writes the matched record of input
- to standard output.
-
- Execution of the awk program shall start by first executing the actions
- associated with all BEGIN patterns in the order they occur in the
- program. Then each _f_i_l_e operand (or standard input if no files were
- specified) shall be processed in turn by reading data from the file until
- a record separator is seen (<newline> by default), splitting the current 1
- record into fields using the current value of FS according to the rules 1
- in 4.1.7.4, evaluating each pattern in the program in the order of 1
- occurrence, and executing the action associated with each pattern that
- matches the current record. The action for a matching pattern shall be
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 4.1 awk - Pattern scanning and processing language 321
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- executed before evaluating subsequent patterns. Last, the actions
- associated with all END patterns shall be executed in the order they
- occur in the program.
-
-
- 4.1.7.2 Expressions
-
-
- Table 4-1 - awk Expressions in Decreasing Precedence
- ___________________________________________________________________________
- Semantic Type of
- Syntax Name Definition Result Assoc
- ___________________________________________________________________________
- (____e__x__p__r_)_______G_r_o_u_p_i_n_g_________________C__S_t_a_n_d_a_r_d__{_7_}_t_y_p_e__o_f____e__x__p__r______n_/_a___
- $_e_x_p_r Field reference 4.1.7.2 string n/a
- ___________________________________________________________________________
- ++ _l_v_a_l_u_e Pre-increment C Standard {7}numeric n/a
- -- _l_v_a_l_u_e Pre-decrement C Standard {7}numeric n/a
- _l_v_a_l_u_e ++ Post-increment C Standard {7}numeric n/a
- __l__v__a__l__u__e_-_-______P_o_s_t_-_d_e_c_r_e_m_e_n_t___________C__S_t_a_n_d_a_r_d__{_7_}_n_u_m_e_r_i_c____________n_/_a___
- _e_x_p_r ^ _e_x_p_r Exponentiation 4.1.7.2 numeric right
- ___________________________________________________________________________
- ! _e_x_p_r Logical not C Standard {7}numeric n/a
- + _e_x_p_r Unary plus C Standard {7}numeric n/a
- -____e__x__p__r________U_n_a_r_y__m_i_n_u_s______________C__S_t_a_n_d_a_r_d__{_7_}_n_u_m_e_r_i_c____________n_/_a___
- _e_x_p_r * _e_x_p_r Multiplication C Standard {7}numeric left
- _e_x_p_r / _e_x_p_r Division C Standard {7}numeric left
- _|e_x_p_r % _e_x_p_r M|odulus 4|.1.7.2 n|umeric l|eft |
- _|______________|________________________|______________|__________________|____|
- _|e_x_p_r + _e_x_p_r A|ddition C| Standard {7}n|umeric l|eft |
- _|_e__x__p__r_-____e__x__p__r___S|_u_b_t_r_a_c_t_i_o_n______________C|__S_t_a_n_d_a_r_d__{_7_}_n|_u_m_e_r_i_c____________l|_e_f_t__|
- _|e_x_p_r _e_x_p_r S|tring concatenation 4|.1.7.2 s|tring l|eft |
- _|______________|________________________|______________|__________________|____|
- _|e_x_p_r < _e_x_p_r L|ess than 4|.1.7.2 n|umeric n|one |
- _|e_x_p_r <= _e_x_p_r L|ess than or equal to 4|.1.7.2 n|umeric n|one |
- _|e_x_p_r != _e_x_p_r N|ot equal to 4|.1.7.2 n|umeric n|one |
- _|e_x_p_r == _e_x_p_r E|qual to 4|.1.7.2 n|umeric n|one |
- _|e_x_p_r > _e_x_p_r G|reater than 4|.1.7.2 n|umeric n|one |
- _|_e__x__p__r_>_=____e__x__p__r__G|_r_e_a_t_e_r__t_h_a_n__o_r__e_q_u_a_l__t_o_4|_._1_._7_._2________n|_u_m_e_r_i_c____________n|_o_n_e__|
- _|e_x_p_r _e_x_p_r E|RE match 4|.1.7.4 n|umeric n|one |
- _|e_x_p_r ~! _e_x_p_r E|RE nonmatch 4|.1.7.4 n|umeric n|one |
- _|_____~_________|________________________|______________|__________________|____|
- _|e_x_p_r in array A|rray membership 4|.1.7.2 n|umeric l|eft |
- (| _i_n_d_e_x ) in M|ultidimension array 4|.1.7.2 n|umeric l|eft |
- _|_____a__r__r__a__y______|___m_e_m_b_e_r_s_h_i_p____________|______________|__________________|____|
- _|e_x_p_r && _e_x_p_r L|ogical AND C| Standard {7}n|umeric l|eft 1|
- _|______________|________________________|______________|__________________|____1|
- _|_e__x__p__r_|_|____e__x__p__r__L|_o_g_i_c_a_l__O_R_______________C|__S_t_a_n_d_a_r_d__{_7_}_n|_u_m_e_r_i_c____________l|_e_f_t__1|1
- _|e_x_p_r_1 ? _e_x_p_r_2 C|onditional expression C| Standard {7}t|ype of selected r|ight1|
- | | | | | |
- | | | | | |
- | C|opyright c 1991 IEEE. A|ll rights rese|rved. | |
- | This is an| unapproved IEEE Standar|ds Draft, subj|ect to change. | |
- | | | | | |
- | | | | | |
- | | | | | |
- | | | | | |
- | | | | | |
- 3|22 | | 4 Execution E|nvironment Utiliti|es |
- | | | | | |
- | | | | | |
- | | | | | |
- | | | | | |
- | | | | | |
- P|art 2: SHELL A|ND UTILITIES | | P1003.2/D11|.2 |
- | | | | | |
- | : _e_x_p_r_3 | | | _e_x_p_r_2 or _e_x_p_r_3| |
- _|______________|________________________|______________|__________________|____|
- _|l_v_a_l_u_e ^= _e_x_p_rE|xponentiation 4|.1.7.2 n|umeric r|ight|
- | a|ssignment | | | |
- _|l_v_a_l_u_e %= _e_x_p_rM|odulus assignment 4|.1.7.2 n|umeric r|ight|
- _|l_v_a_l_u_e *= _e_x_p_rM|ultiplication C| Standard {7}n|umeric r|ight|
- | a|ssignment | | | |
- _|l_v_a_l_u_e /= _e_x_p_rD|ivision assignment C| Standard {7}n|umeric r|ight|
- _|l_v_a_l_u_e += _e_x_p_rA|ddition assignment C| Standard {7}n|umeric r|ight|
- _|l_v_a_l_u_e -= _e_x_p_rS|ubtraction assignment C| Standard {7}n|umeric r|ight|
- _|_l__v__a__l__u__e_=____e__x__p__r_A|_s_s_i_g_n_m_e_n_t_______________C|__S_t_a_n_d_a_r_d__{_7_}_t|_y_p_e__o_f____e__x__p__r______r|_i_g_h_t_|
-
-
-
-
- Expressions describe computations used in _p_a_t_t_e_r_n_s and _a_c_t_i_o_n_s. In
- Table 4-1, valid expression operations are given in groups from highest
- precedence first to lowest precedence last, with equal-precedence
- operators grouped between horizontal lines. In expression evaluation,
- higher precedence operators shall be evaluated before lower precedence
- operators. In this table _e_x_p_r, _e_x_p_r_1, _e_x_p_r_2, and _e_x_p_r_3 represent any
- expression, while _l_v_a_l_u_e represents any entity that can be assigned to
- (i.e., on the left side of an assignment operator). The precise syntax
- of expressions is given in the grammar in 4.1.7.7.
-
- Each expression shall have either a string value, a numeric value, or
- both. Except as stated for specific contexts, the value of an expression
- shall be implicitly converted to the type needed for the context in which
- it is used. A string value shall be converted to a numeric value by the
- equivalent of the following calls to functions defined by the
- C Standard {7}:
-
- setlocale(LC_NUMERIC, "");
- _n_u_m_e_r_i_c__v_a_l_u_e = _a_t_o_f(_s_t_r_i_n_g__v_a_l_u_e);
-
- A numeric value that is exactly equal to the value of an integer (see
- 2.9.2.1) shall be converted to a string by the equivalent of a call to
- the sprintf function (see 4.1.7.6.2) with the string "%d" as the _f_m_t
- argument and the numeric value being converted as the first and only _e_x_p_r
- argument. Any other numeric value shall be converted to a string by the
- equivalent of a call to the sprintf function with the value of the
- variable CONVFMT as the _f_m_t argument and the numeric value being
- converted as the first and only _e_x_p_r argument. The result of the 1
- conversion is unspecified if the value of CONVFMT is not a floating-point 1
- format specification. This standard specifies no explicit conversions 1
- between numbers and strings. An application can force an expression to
- be treated as a number by adding zero to it, or can force it to be
- treated as a string by concatenating the null string ("") to it.
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 4.1 awk - Pattern scanning and processing language 323
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- A string value shall be considered to be a _n_u_m_e_r_i_c _s_t_r_i_n_g in the
- following case:
-
- (1) Any leading and trailing <blank>_s shall be ignored.
-
- (2) If the first unignored character is a + or -, it shall be
- ignored.
-
- (3) If the remaining unignored characters would be lexically
- recognized as a NUMBER token (as described by the lexical
- conventions in 4.1.7.7), the string shall be considered a
- _n_u_m_e_r_i_c _s_t_r_i_n_g.
-
- If a - character is ignored in the above steps, the numeric value of the
- _n_u_m_e_r_i_c _s_t_r_i_n_g shall be the negation of the numeric value of the
- recognized NUMBER token. Otherwise the numeric value of the _n_u_m_e_r_i_c
- _s_t_r_i_n_g shall be the numeric value of the recognized NUMBER token.
- Whether or not a string is a _n_u_m_e_r_i_c _s_t_r_i_n_g shall be relevant only in
- contexts where that term is used in this clause.
-
- When an expression is used in a Boolean context (the first subexpression
- of a conditional expression, an expression operated on by logical NOT,
- logical AND, or logical OR, the second expression of a for statement, the
- expression of an if statement, or the expression of a while statement),
- if it has a numeric value, a value of zero shall be treated as false and
- any other value shall be treated as true. Otherwise, a string value of
- the null string shall be treated as false and any other value shall be
- treated as true.
-
- All arithmetic shall follow the semantics of floating point arithmetic as
- specified by the C Standard {7}; see 2.9.2.
-
- The value of the expression
-
- _e_x_p_r_1 ^ _e_x_p_r_2
-
- shall be equivalent to the value returned by the C Standard {7} function
- call
-
- _p_o_w(_e_x_p_r_1, _e_x_p_r_2)
-
- The expression
-
- _l_v_a_l_u_e ^= _e_x_p_r
-
- shall be equivalent to the C Standard {7} expression
-
- _l_v_a_l_u_e = _p_o_w(_l_v_a_l_u_e, _e_x_p_r)
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 324 4 Execution Environment Utilities
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- except that _l_v_a_l_u_e shall be evaluated only once. The value of the
- expression
-
- _e_x_p_r_1 % _e_x_p_r_2
-
- shall be equivalent to the value returned by the C Standard {7} function
- call
-
- _f_m_o_d(_e_x_p_r_1, _e_x_p_r_2)
-
- The expression
-
- _l_v_a_l_u_e %= _e_x_p_r
-
- shall be equivalent to the C Standard {7} expression
-
- _l_v_a_l_u_e = _f_m_o_d(_l_v_a_l_u_e, _e_x_p_r)
-
- except that _l_v_a_l_u_e shall be evaluated only once.
-
- Variables and fields shall be set by the assignment statement:
-
- _l_v_a_l_u_e = _e_x_p_r_e_s_s_i_o_n
-
- and the type of _e_x_p_r_e_s_s_i_o_n shall determine the resulting variable type.
- The assignment includes the arithmetic assignments (+=, -=, *=, /=, %=,
- ^=, ++, --) all of which produce a numeric result. The left-hand side of
- an assignment and the target of increment and decrement operators can be
- one of a variable, an array with index, or a field selector.
-
- The awk language shall supply arrays that are used for storing numbers or
- strings. Arrays need not be declared. They shall initially be empty,
- and their sizes shall change dynamically. The subscripts, or element
- identifiers, are strings, providing a type of associative array
- capability. An array name followed by a subscript within square brackets
- can be used as an _l_v_a_l_u_e and thus as an expression, as described in the
- grammar (see 4.1.7.7). Unsubscripted array names can be used in only the
- following contexts:
-
- - A parameter in a function definition or function call.
-
- - The NAME token following any use of the keyword in as specified in
- the grammar (see 4.1.7.7). If the name used in this context is not
- an array name, the behavior is undefined.
-
- A valid array _i_n_d_e_x shall consist of one or more comma-separated
- expressions, similar to the way in which multidimensional arrays are
- indexed in some programming languages. Because awk arrays are really one
- dimensional, such a comma-separated list shall be converted to a single
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 4.1 awk - Pattern scanning and processing language 325
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- string by concatenating the string values of the separate expressions,
- each separated from the other by the value of the SUBSEP variable. Thus,
- the following two index operations shall be equivalent:
-
- _v_a_r[_e_x_p_r_1, _e_x_p_r_2, ..., _e_x_p_r_n]
- _v_a_r[_e_x_p_r_1 _S_U_B_S_E_P _e_x_p_r_2 _S_U_B_S_E_P ... SUBSEP _e_x_p_r_n]
-
- A multidimensioned _i_n_d_e_x used with the in operator shall be
- parenthesized. The in operator, which tests for the existence of a
- particular array element, shall not cause that element to exist. Any
- other reference to a nonexistent array element shall automatically create
- it.
-
- Comparisons (with the <, <=, !=, ==, >, and >= operators) shall be made
- numerically if both operands are numeric or if one is numeric and the
- other has a string value that is a numeric string. Otherwise, operands 1
- shall be converted to strings as required and a string comparison shall 1
- be made using the locale-specific collation sequence. The value of the
- comparison expression shall be 1 if the relation is true, or 0 if the
- relation is false.
-
-
- 4.1.7.3 Variables and Special Variables
-
- Variables can be used in an awk program by referencing them. With the
- exception of function parameters (see 4.1.7.6.2), they are not explicitly
- declared. Uninitialized scalar variables and array elements have both a
- numeric value of zero and a string value of the empty string.
-
- Field variables shall be designated by a $ followed by a number or
- numerical expression. The effect of the field number _e_x_p_r_e_s_s_i_o_n
- evaluating to anything other than a nonnegative integer is unspecified;
- uninitialized variables or string values need not be converted to numeric
- values in this context. New field variables can be created by assigning
- a value to them. References to nonexistent fields (i.e., fields after
- $NF), shall produce the null string. However, assigning to a nonexistent
- field [e.g., $(NF+_2) = 5] shall increase the value of NF, create any
- intervening fields with the null string as their values, and cause the
- value of $0 to be recomputed, with the fields being separated by the
- value of OFS. Each field variable shall have a string value when
- created. If the string, with any occurrence of the decimal-point
- character from the current locale changed to a <period>, would be
- considered a _n_u_m_e_r_i_c _s_t_r_i_n_g (see 4.1.7.2), the field variable shall also
- have the numeric value of the _n_u_m_e_r_i_c _s_t_r_i_n_g.
-
- The implementation shall support the following other special variables
- that are set by awk:
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 326 4 Execution Environment Utilities
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- ARGC The number of elements in the ARGV array.
-
- ARGV An array of command line arguments, excluding options and
- the _p_r_o_g_r_a_m argument, numbered from zero to ARGC-_1.
-
- The arguments in ARGV can be modified or added to; ARGC
- can be altered. As each input file ends, awk shall treat
- the next nonnull element of ARGV, up through the current
- value of ARGC-_1, as the name of the next input file.
- Thus, setting an element of ARGV to null means that it
- shall not be treated as an input file. The name '-' shall
- indicate the standard input. If an argument matches the
- format of an _a_s_s_i_g_n_m_e_n_t operand, this argument shall be
- treated as an assignment rather than a _f_i_l_e argument.
-
- CONVFMT The printf format for converting numbers to strings
- (except for output statements, where OFMT is used); "%.6g"
- by default.
-
- ENVIRON The variable ENVIRON is an array representing the value of
- the environment, as described in POSIX.1 {8} 2.7. The
- indices of the array shall be strings consisting of the
- names of the environment variables, and the value of each
- array element shall be a string consisting of the value of
- that variable. If the value of an environment variable is
- considered a _n_u_m_e_r_i_c _s_t_r_i_n_g (see 4.1.7.2), the array
- element shall also have its numeric value.
-
- In all cases where the behavior of awk is affected by
- environment variables [including the environment of any
- command(s) that awk executes via the system function or
- via pipeline redirections with the print statement, the
- printf statement, or the getline function], the
- environment used shall be the environment at the time awk
- began executing; it is implementation defined whether any 1
- modification of ENVIRON affects this environment. 1
-
- FILENAME A pathname of the current input file. Inside a BEGIN
- action the value is undefined. Inside an END action the
- value is the name of the last input file processed.
-
- FNR The ordinal number of the current record in the current
- file. Inside a BEGIN action the value is zero. Inside an
- END action the value is the number of the last record
- processed in the last file processed.
-
- FS Input field separator regular expression; <space> by
- default.
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 4.1 awk - Pattern scanning and processing language 327
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- NF The number of fields in the current record. Inside a
- BEGIN action, the use of NF is undefined unless a getline
- function without a _v_a_r argument is executed previously.
- Inside an END action, NF shall retain the value it had for
- the last record read, unless a subsequent, redirected,
- getline function without a _v_a_r argument is performed prior
- to entering the END action.
-
- NR The ordinal number of the current record from the start of
- input. Inside a BEGIN action the value is zero. Inside
- an END action the value is the number of the last record
- processed.
-
- OFMT The printf format for converting numbers to strings in
- output statements (see 4.1.7.6.1); "%.6g" by default. The 2
- result of the conversion is unspecified if the value of 2
- OFMT is not a floating-point format specification. 2
-
- OFS The print statement output field separation; <space> by
- default.
-
- ORS The print statement output record separator; <newline> by
- default.
-
- RLENGTH The length of the string matched by the match function.
-
- RS The first character of the string value of RS is the input
- record separator; <newline> by default. If RS contains
- more than one character, the results are unspecified. If
- RS is null, then records are separated by sequences of one
- or more blank lines, leading or trailing blank lines do
- not result in empty records at the beginning or end of the
- input, and <newline> is always a field separator, no
- matter what the value of FS is.
-
- RSTART The starting position of the string matched by the match
- function, numbering from 1. This is always equivalent to
- the return value of the match function.
-
- SUBSEP The subscript separator string for multidimensional
- arrays; the default value is implementation defined.
-
-
- 4.1.7.4 Regular Expressions
-
- The awk utility shall make use of the extended regular expression
- notation (see 2.8.4) except that it shall allow the use of C-language
- conventions for escaping special characters within the EREs, as specified
- in Table 2-15 and Table 4-2; these escape sequences shall be recognized 1
- both inside and outside bracket expressions. Note that records need not 1
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 328 4 Execution Environment Utilities
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- be separated by <newline>s and string constants can contain <newline>s, 1
- so even the \n sequence is valid in awk EREs. Using a slash character 1
- within the regular expression requires the escaping shown in Table 4-2. 1
-
- A regular expression can be matched against a specific field or string by
- using one of the two regular expression matching operators, and ! .
- These operators shall interpret their right-hand operand as ~a regul~ar
- expression and their left-hand operand as a string. If the regular
- expression matches the string, the expression shall evaluate to a value
- of 1, and the ! expression shall e~valuate to a value of 0. (The regular
- expression matc~hing operation is as defined in 2.8.1.2, where a match
- occurs on any part of the string unless the regular expression is limited
- with the circumflex or dollar-sign special characters.) If the regular
- expression does not match the string, the expression shall evaluate to
- a value of 0, and the ! expression shall ~evaluate to a value of 1. If
- the right-hand operand ~is any expression other than the lexical token
- ERE, the string value of the expression shall be interpreted as an
- extended regular expression, including the escape conventions described
- above. Note that these same escape conventions also shall be applied in
- the determining the value of a string literal (the lexical token STRING),
- and thus shall be applied a second time when a string literal is used in
- this context.
-
- When an ERE token appears as an expression in any context other than as
- the right-hand of the or ! operator or as one of the built-in function
- arguments described be~low, t~he value of the resulting expression shall be
- the equivalent of
-
- $0 /_e_r_e/
- ~
- The _E_R_E argument to the gsub, match, sub functions, and the _f_s argument
- to the split function (see 4.1.7.6.2) shall be interpreted as extended
- regular expressions. These can be either ERE tokens or arbitrary
- expressions, and shall be interpreted in the same manner as the right-
- hand side of the or ! operator.
- ~ ~
- An extended regular expression can be used to separate fields by using
- the -F _E_R_E option or by assigning a string containing the expression to
- the built-in variable FS. The default value of the FS variable shall be
- a single <space> character. The following describes FS behavior:
-
- (1) If FS is a single character:
-
- (a) If FS is <space>, skip leading and trailing <blank>_s;
- fields shall be delimited by sets of one or more <blank>_s.
-
- (b) Otherwise, if FS is any other character _c, fields shall be
- delimited by each single occurrence of _c.
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 4.1 awk - Pattern scanning and processing language 329
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- (2) Otherwise, the string value of FS shall be considered to be an
- extended regular expression. Each occurrence of a sequence
- matching the extended regular expression shall delimit fields.
-
- Except in the gsub, match, split, and sub built-in functions, regular
- expression matching shall be based on input records; i.e., record
- separator characters (the first character of the value of the variable
- RS, <newline> by default) cannot be embedded in the expression, and no
- expression shall match the record separator character. If the record
- separator is not <newline>, <newline> characters embedded in the
- expression can be matched. In those four built-in functions, regular
- expression matching shall be based on text strings; i.e., any character
- (including <newline> and the record separator) can be embedded in the
- pattern and an appropriate pattern shall match any character. However,
- in all awk regular expression matching, the use of one or more NUL
- characters in the pattern, input record, or text string produces
- undefined results.
-
-
- 4.1.7.5 Patterns
-
- A _p_a_t_t_e_r_n is any valid _e_x_p_r_e_s_s_i_o_n, a range specified by two expressions
- separated by comma, or one of the two special patterns BEGIN or END.
-
- 4.1.7.5.1 Special Patterns
-
- The awk utility shall recognize two special patterns, BEGIN and END.
- Each BEGIN pattern shall be matched once and its associated action
- executed before the first record of input is read [except possibly by use
- of the getline function (see 4.1.7.6.2) in a prior BEGIN action] and
- before command line assignment is done. Each END pattern shall be
- matched once and its associated action executed after the last record of
- input has been read. These two patterns shall have associated actions.
-
- BEGIN and END shall not combine with other patterns. Multiple BEGIN and
- END patterns shall be allowed. The actions associated with the BEGIN
- patterns shall be executed in the order specified in the program, as are
- the END actions. An END pattern can precede a BEGIN pattern in a
- program.
-
- If an awk program consists of only actions with the pattern BEGIN, and
- the BEGIN action contains no getline function, awk shall exit without
- reading its input when the last statement in the last BEGIN action is
- executed. If an awk program consists of only actions with the pattern
- END or only actions with the patterns BEGIN and END, the input shall be
- read before the statements in the END action(s) are executed.
-
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 330 4 Execution Environment Utilities
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- 4.1.7.5.2 Expression Patterns
-
- An expression pattern shall be evaluated as if it were an expression in a 1
- Boolean context. If the result is true, the pattern shall be considered 1
- to match, and the associated action (if any) shall be executed. If the 1
- result is false, the action shall not be executed. 1
-
- 4.1.7.5.3 Pattern Ranges
-
- A pattern range consists of two expressions separated by a comma; in this
- case, the action shall be performed for all records between a match of
- the first expression and the following match of the second expression,
- inclusive. At this point, the pattern range can be repeated starting at
- input records subsequent to the end of the matched range.
-
-
- 4.1.7.6 Actions
-
- An action is a sequence of statements as shown in the grammar in 4.1.7.7.
- Any single statement can be replaced by a statement list enclosed in
- braces. The statements in a statement list shall be separated by
- <newline>s or semicolons, and shall be executed sequentially in the order
- that they appear.
-
- The _e_x_p_r_e_s_s_i_o_n acting as the conditional in an if statement shall be
- evaluated and if it is nonzero or nonnull, the following _s_t_a_t_e_m_e_n_t shall
- be executed; otherwise, if else is present, the statement following the
- else shall be executed.
-
- The if, while, do ... while, for, break, and continue statements are
- based on the C Standard {7} (see 2.9.2), except that the Boolean
- expressions shall be treated as described in 4.1.7.2, and except in the
- case of
-
- for (_v_a_r_i_a_b_l_e _i_n _a_r_r_a_y)
-
- which shall iterate, assigning each _i_n_d_e_x of _a_r_r_a_y to _v_a_r_i_a_b_l_e in an
- unspecified order. The results of adding new elements to _a_r_r_a_y within
- such a for loop are undefined. If a break or continue statement occurs
- outside of a loop, the behavior is undefined.
-
- The delete statement shall remove an individual array element. Thus, the
- following code shall delete an entire array:
-
- for (index in array)
- delete array[index]
-
- The next statement shall cause all further processing of the current
- input record to be abandoned. The behavior is undefined if a next
- statement appears or is invoked in a BEGIN or END action.
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 4.1 awk - Pattern scanning and processing language 331
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- The exit statement shall invoke all END actions in the order in which
- they occur in the program source and then terminate the program without
- reading further input. An exit statement inside an END action shall
- terminate the program without further execution of END actions. If an
- expression is specified in an exit statement, its numeric value shall be
- the exit status of awk, unless subsequent errors are encountered or a
- subsequent exit statement with an expression is executed.
-
- 4.1.7.6.1 Output Statements
-
- Both print and printf statements shall write to standard output by
- default. The output shall be written to the location specified by
- _o_u_t_p_u_t__r_e_d_i_r_e_c_t_i_o_n if one is supplied, as follows:
-
- > _e_x_p_r_e_s_s_i_o_n
- >> _e_x_p_r_e_s_s_i_o_n
- | _e_x_p_r_e_s_s_i_o_n
-
- In all cases, the _e_x_p_r_e_s_s_i_o_n shall be evaluated to produce a string that
- is used as a full pathname to write into (for > or >>) or as a command to
- be executed (for |). Using the first two forms, if the file of that name
- is not currently open, it shall be opened, creating it if necessary, and
- using the first form, truncating the file. The output then shall be
- appended to the file. As long as the file remains open, subsequent calls
- in which _e_x_p_r_e_s_s_i_o_n evaluates to the same string value simply shall
- append output to the file. The file remains open until the close
- function (see 4.1.7.6.2). is called with an expression that evaluates to
- the same string value.
-
- The third form shall write output onto a stream piped to the input of a
- command. The stream shall be created if no stream is currently open with
- the value of _e_x_p_r_e_s_s_i_o_n as its command name. The stream created shall be
- equivalent to one created by a call to the _p_o_p_e_n() function (see B.3.2)
- with the value of _e_x_p_r_e_s_s_i_o_n as the _c_o_m_m_a_n_d argument and a value of "w"
- as the _m_o_d_e argument. As long as the stream remains open, subsequent
- calls in which _e_x_p_r_e_s_s_i_o_n evaluates to the same string value shall write
- output to the existing stream. The stream shall remain open until the
- close function (see 4.1.7.6.2) is called with an expression that
- evaluates to the same string value. At that time, the stream shall be
- closed as if by a call to the _p_c_l_o_s_e() function (see B.3.2).
-
- As described in detail by the grammar in 4.1.7.7, these output statements
- shall take a comma-separated list of _e_x_p_r_e_s_s_i_o_ns referred in the grammar
- by the nonterminal symbols expr_list, print_expr_list, or
- print_expr_list_opt. This list is referred to here as the _e_x_p_r_e_s_s_i_o_n
- _l_i_s_t, and each member is referred to as an _e_x_p_r_e_s_s_i_o_n _a_r_g_u_m_e_n_t.
-
- The print statement shall write the value of each expression argument
- onto the indicated output stream separated by the current output field
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 332 4 Execution Environment Utilities
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- separator (see variable OFS above), and terminated by the output record
- separator (see variable ORS above). All expression arguments shall be
- taken as strings, being converted if necessary; this conversion shall be 1
- as described in 4.1.7.2, with the exception that the printf format in 1
- OFMT shall be used instead of the value in CONVFMT. An empty expression 1
- list shall stand for the whole input record ($0).
-
- The printf statement shall produce output based on a notation similar to
- the File Format Notation used to describe file formats in this standard
- (see 2.12). Output shall be produced as specified with the first
- expression argument as the string <_f_o_r_m_a_t> and subsequent expression
- arguments as the strings <_a_r_g_1> through <_a_r_g_n>, with the following
- exceptions:
-
- (1) The _f_o_r_m_a_t shall be an actual character string rather than a
- graphical representation. Therefore, it cannot contain empty
- character positions. The <space> character in the _f_o_r_m_a_t
- string, in any context other than a _f_l_a_g of a conversion
- specification, shall be treated as an ordinary character that is
- copied to the output.
-
- (2) If the character set contains a W character and that character
- appears in the _f_o_r_m_a_t string, it shall be treated as an ordinary
- character that is copied to the output.
-
- (3) The _e_s_c_a_p_e _s_e_q_u_e_n_c_e_s beginning with a backslash character shall
- be treated as sequences of ordinary characters that are copied
- to the output. (Note that these same sequences shall be
- interpreted lexically by awk when they appear in literal
- strings, but they shall not be treated specially by the printf
- statement).
-
- (4) A _f_i_e_l_d _w_i_d_t_h or _p_r_e_c_i_s_i_o_n can be specified as the * character
- instead of a digit string. In this case the next argument from
- the expression list shall be fetched and its numeric value taken
- as the field width or precision.
-
- (5) The implementation shall not precede or follow output from the d
- or u conversion specifications with <blank>_s not specified by
- the _f_o_r_m_a_t string.
-
- (6) The implementation shall not precede output from the o
- conversion specification with leading zeroes not specified by
- the _f_o_r_m_a_t string.
-
- (7) For the c conversion specification: if the argument has a
- numeric value, the character whose encoding is that value shall
- be output. If the value is zero or is not the encoding of any
- character in the character set, the behavior is undefined. If
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 4.1 awk - Pattern scanning and processing language 333
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- the argument does not have a numeric value, the first character
- of the string value shall be output; if the string does not
- contain any characters the behavior is undefined.
-
- (8) For each conversion specification that consumes an argument, the
- next expression argument shall be evaluated. With the exception
- of the c conversion, the value shall be converted (according to
- the rules specified in 4.1.7.2) to the appropriate type for the
- conversion specification.
-
- (9) If there are insufficient expression arguments to satisfy all
- the conversion specifications in the _f_o_r_m_a_t string, the behavior
- is undefined.
-
- (10) If any character sequence in the _f_o_r_m_a_t string begins with a %
- character, but does not form a valid conversion specification,
- the behavior is unspecified.
-
- Both print and printf can output at least {LINE_MAX} bytes.
-
- 4.1.7.6.2 Functions
-
- The awk language has a variety of built-in functions: arithmetic, string,
- input/output, and general.
-
- 4.1.7.6.2.1 _A_r_i_t_h_m_e_t_i_c__F_u_n_c_t_i_o_n_s
-
- The arithmetic functions, except for int, shall be based on the
- C Standard {7}; see 2.9.2. The behavior is undefined in cases where the
- C Standard {7} specifies that an error be returned or that the behavior
- is undefined.
-
- atan2(_y,_x) Return arctangent of _y/_x.
-
- cos(_x) _R_e_t_u_r_n _c_o_s_i_n_e _o_f _x, _w_h_e_r_e _x _i_s _i_n _r_a_d_i_a_n_s.
-
- _s_i_n(_x) _R_e_t_u_r_n _s_i_n_e _o_f _x, _w_h_e_r_e _x _i_s _i_n _r_a_d_i_a_n_s.
-
- _e_x_p(_x) _R_e_t_u_r_n _t_h_e _e_x_p_o_n_e_n_t_i_a_l _f_u_n_c_t_i_o_n _o_f _x.
-
- _l_o_g(_x) _R_e_t_u_r_n _t_h_e _n_a_t_u_r_a_l _l_o_g_a_r_i_t_h_m _o_f _x.
-
- _s_q_r_t(_x) _R_e_t_u_r_n _t_h_e _s_q_u_a_r_e _r_o_o_t _o_f _x.
-
- _i_n_t(_x) _T_r_u_n_c_a_t_e _i_t_s _a_r_g_u_m_e_n_t _t_o _a_n _i_n_t_e_g_e_r. _I_t _s_h_a_l_l _b_e
- _t_r_u_n_c_a_t_e_d _t_o_w_a_r_d _0 _w_h_e_n _x > 0.
-
- rand() _R_e_t_u_r_n _a _r_a_n_d_o_m _n_u_m_b_e_r _n, _s_u_c_h _t_h_a_t _0 _< _n < _1.
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 334 4 Execution Environment Utilities
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- _s_r_a_n_d([expr]) Set the seed value for rand to _e_x_p_r or use the time
- of day if _e_x_p_r is omitted. The previous seed value
- shall be returned.
-
- 4.1.7.6.2.2 _S_t_r_i_n_g__F_u_n_c_t_i_o_n_s
-
- The string functions are:
-
- gsub(_e_r_e, _r_e_p_l[,_i_n])
- Behave like sub (see below), except that it shall
- replace all occurrences of the regular expression
- (like the ed utility global substitute) in $0 or in
- the _i_n argument, when specified.
-
- index(_s, _t) Return the position, in characters, numbering from
- 1, in string _s where string _t first occurs, or zero
- if it does not occur at all.
-
- length([_s]) Return the length, in characters, of its argument
- taken as a string, or of the whole record, $0, if
- there is no argument.
-
- match(_s, _e_r_e) Return the position, in characters, numbering from
- 1, in string _s where the extended regular
- expression _E_R_E occurs, or zero if it does not occur
- at all. RSTART shall be set to the starting
- position (which is the same as the returned value),
- zero if no match is found; RLENGTH shall be set to
- the length of the matched string, -1 if no match is
- found.
-
- split(_s, _a[,_f_s]) Split the string _s into array elements _a[1], _a[2],
- ... , _a[_n], and returns _n. The separation shall be
- done with the extended regular expression _f_s or
- with the field separator FS if _f_s is not given.
- Each array element shall have a string value when
- created. If the string assigned to any array
- element, with any occurrence of the decimal-point
- character from the current locale changed to a
- <period>, would be considered a _n_u_m_e_r_i_c _s_t_r_i_n_g (see
- 4.1.7.2), the array element shall also have the
- numeric value of the _n_u_m_e_r_i_c _s_t_r_i_n_g. The effect of
- a null string as the value of _f_s is unspecified.
-
- sprintf(_f_m_t, _e_x_p_r, _e_x_p_r, ...)
- Format the expressions according to the printf
- format given by _f_m_t and return the resulting
- string.
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 4.1 awk - Pattern scanning and processing language 335
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- sub(_e_r_e, _r_e_p_l[,_i_n])
- Substitute the string _r_e_p_l in place of the first
- instance of the extended regular expression _E_R_E in
- string _i_n and return the number of substitutions.
- An ampersand (&) appearing in the string _r_e_p_l shall
- be replaced by the string from _i_n that matches the
- regular expression. An ampersand preceded by a
- backslash within _r_e_p_l shall be interpreted as a
- literal ampersand character. If _i_n is specified
- and it is not an _l_v_a_l_u_e (see 4.1.7.2), the behavior
- is undefined. If _i_n is omitted, awk shall
- substitute in the current record ($0).
-
- substr(_s, _m[,_n])
- Return the at most _n-character substring of _s that
- begins at position _m, numbering from 1. If _n is
- missing, the length of the substring shall be
- limited by the length of the string _s.
-
- tolower(_s) Return a string based on the string _s. Each
- character in _s that is an uppercase letter
- specified to have a tolower mapping by the LC_CTYPE
- category of the current locale shall be replaced in
- the returned string by the lowercase letter
- specified by the mapping. Other characters in _s
- shall be unchanged in the returned string.
-
- toupper(_s) Return a string based on the string _s. Each
- character in _s that is a lowercase letter specified
- to have a toupper mapping by the LC_CTYPE category
- of the current locale shall be replaced in the
- returned string by the uppercase letter specified
- by the mapping. Other characters in _s shall be
- unchanged in the returned string.
-
- All of the preceding functions that take _E_R_E as a parameter expect a
- pattern or a string valued expression that is a regular expression as
- defined in 4.1.7.4.
-
- 4.1.7.6.2.3 _I_n_p_u_t_/_O_u_t_p_u_t__a_n_d__G_e_n_e_r_a_l__F_u_n_c_t_i_o_n_s
-
- The input/output and general functions are:
-
- close(_e_x_p_r_e_s_s_i_o_n) Close the file or pipe opened by a print or printf
- statement or a call to getline with the same
- string-valued _e_x_p_r_e_s_s_i_o_n. The limit on the number
- of open _e_x_p_r_e_s_s_i_o_n arguments is implementation
- defined. If the close was successful, the function
- shall return zero; otherwise, it shall return
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 336 4 Execution Environment Utilities
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- nonzero.
-
- _e_x_p_r_e_s_s_i_o_n | _g_e_t_l_i_n_e [_v_a_r]
- Read a record of input from a stream piped from the
- output of a command. The stream shall be created
- if no stream is currently open with the value of
- _e_x_p_r_e_s_s_i_o_n as its command name. The stream created
- shall be equivalent to one created by a call to the
- _p_o_p_e_n() function with the value of _e_x_p_r_e_s_s_i_o_n as
- the _c_o_m_m_a_n_d argument and a value of "r" as the _m_o_d_e
- argument. As long as the stream remains open,
- subsequent calls in which _e_x_p_r_e_s_s_i_o_n evaluates to
- the same string value shall read subsequent records
- from the file. The stream shall remain open until
- the close function is called with an expression
- that evaluates to the same string value. At that
- time, the stream shall be closed as if by a call to
- the _p_c_l_o_s_e() function. If _v_a_r is missing, $0 and
- NF shall be set; otherwise, _v_a_r shall be set.
-
- getline Set $0 to the next input record from the current
- input file. This form of getline shall set the NF,
- NR, and FNR variables.
-
- getline _v_a_r Set variable _v_a_r to the next input record from the
- current input file. This form of getline shall set
- the FNR and NR variables.
-
- getline [_v_a_r] < _e_x_p_r_e_s_s_i_o_n
- Read the next record of input from a named file.
- The _e_x_p_r_e_s_s_i_o_n shall be evaluated to produce a
- string that is used as a full pathname. If the
- file of that name is not currently open, it shall
- be opened. As long as the stream remains open,
- subsequent calls in which _e_x_p_r_e_s_s_i_o_n evaluates to
- the same string value shall read subsequent records
- from the file. The file shall remain open until
- the close function is called with an expression
- that evaluates to the same string value. If _v_a_r is
- missing, $0 and NF shall be set; otherwise, _v_a_r
- shall be set.
-
- system(_e_x_p_r_e_s_s_i_o_n)
- Execute the command given by _e_x_p_r_e_s_s_i_o_n in a manner
- equivalent to the _s_y_s_t_e_m() function [see B.3.1] and
- return the exit status of the command.
-
- All forms of getline shall return 1 for successful input, zero for end of
- file, and -1 for an error.
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 4.1 awk - Pattern scanning and processing language 337
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- 4.1.7.6.2.4 _U_s_e_r_-_D_e_f_i_n_e_d__F_u_n_c_t_i_o_n_s
-
- The awk language also shall provide user-defined functions. Such
- functions can be defined as:
-
- _f_u_n_c_t_i_o_n _n_a_m_e(_a_r_g_s,...) { _s_t_a_t_e_m_e_n_t_s }
-
- A function can be referred to anywhere in an awk program; in particular,
- its use can precede its definition. The scope of a function shall be
- global.
-
- Function arguments can be either scalars or arrays; the behavior is
- undefined if an array name is passed as an argument that the function
- uses as a scalar, or if a scalar expression is passed as an argument that
- the function uses as an array. Function arguments shall be passed by
- value if scalar and by reference if array name. Argument names shall be
- local to the function; all other variable names shall be global. The
- same name shall not be used as both an argument name and as the name of a
- function or a special awk variable. The same name shall not be used both
- as a variable name with global scope and as the name of a function. The
- same name shall not be used within the same scope both as a scalar
- variable and as an array.
-
- The number of parameters in the function definition need not match the
- number of parameters in the function call. Excess formal parameters can
- be used as local variables. If fewer arguments are supplied in a 1
- function call than are in the function definition, the extra parameters 1
- that are used in the function body as scalars shall be initialized with a 1
- string value of the null string and a numeric value of zero, and the 1
- extra parameters that are used in the function body as arrays shall be 1
- initialized as empty arrays. If more arguments are supplied in a 1
- function call than are in the function definition, the behavior is
- undefined.
-
- When invoking a function, no white space can be placed between the
- function name and the opening parenthesis. The implementation shall 1
- permit function calls to be nested, and for recursive calls to be made 1
- upon functions. Upon return from any nested or recursive function call,
- the values of all of the calling function's parameters shall be
- unchanged, except for array parameters passed by reference. The return
- statement can be used to return a value. If a return statement appears
- outside of a function definition, the behavior is undefined.
-
- In the function definition, <newline>s shall be optional before the
- opening brace and after the closing brace. Function definitions can
- appear anywhere in the program where a _p_a_t_t_e_r_n-_a_c_t_i_o_n pair is allowed.
-
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 338 4 Execution Environment Utilities
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- _4._1._7._7 awk _G_r_a_m_m_a_r
-
- The grammar in this subclause and the lexical conventions in the
- following subclause shall together describe the syntax for awk programs.
- The general conventions for this style of grammar are described in 2.1.2.
- A valid program can be represented as the nonterminal symbol _p_r_o_g_r_a_m in
- the grammar. Any discrepancies found between this grammar and other
- descriptions in this clause shall be resolved in favor of this grammar.
-
- %token NAME NUMBER STRING ERE NEWLINE
- %token FUNC_NAME /* name followed by '(' without white space */
-
- /* Keywords */
- %token Begin End
- /* 'BEGIN' 'END' */
-
- %token Break Continue Delete Do Else
- /* 'break' 'continue' 'delete' 'do' 'else' */
-
- %token Exit For Function If In
- /* 'exit' 'for' 'function' 'if' 'in' */
-
- %token Next Print Printf Return While
- /* 'next' 'print' 'printf' 'return' 'while' */
-
- /* Reserved function names */
- %token BUILTIN_FUNC_NAME /* one token for the following:
- * atan2 cos sin exp log sqrt int rand srand
- * gsub index length match split sprintf sub substr
- * tolower toupper close system
- */
- %token GETLINE /* Syntactically different from other built-ins */
-
- /* Two-character tokens */
- %token ADD_ASSIGN SUB_ASSIGN MUL_ASSIGN DIV_ASSIGN MOD_ASSIGN POW_ASSIGN
- /* '+=' '-=' '*=' '/=' '%=' '^=' */
-
- %token OR AND NO_MATCH EQ LE GE NE INCR DECR APPEND
- /* '||' '&&' '! ' '==' '<=' '>=' '!=' '++' '--' '>>' */
- ~
- /* One-character tokens */
- %token '{' '}' '(' ')' '[' ']' ',' ';'
- %token '+' '-' '*' '%' '^' '!' '>' '<' '|' '?' ':' ' ' '$' '='
- ~
- %start program
- %%
-
-
- program:
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 4.1 awk - Pattern scanning and processing language 339
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- item_list
- | actionless_item_list
- ;
-
- item_list:
- newline_opt
- | actionless_item_list item terminator
- | item_list item terminator
- | item_list action terminator
- ;
-
- actionless_item_list:
- item_list pattern terminator
- | actionless_item_list pattern terminator
- ;
-
- item:
- pattern action
- | Function NAME '(' param_list_opt ')' newline_opt action
- | Function FUNC_NAME '(' param_list_opt ')' newline_opt action
- ;
-
- param_list_opt:
- /* empty */
- | param_list
- ;
-
- param_list:
- NAME
- | param_list ',' NAME
- ;
-
- pattern:
- Begin
- | End
- | expr
- | expr ',' newline_opt expr
- ;
-
- action:
- '{' newline_opt '}'
- | '{' newline_opt terminated_statement_list '}'
- | '{' newline_opt unterminated_statement_list '}'
- ;
-
- terminator:
- ';'
- | NEWLINE
- | terminator NEWLINE ';' 2
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 340 4 Execution Environment Utilities
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- ;
-
- terminated_statement_list:
- terminated_statement
- | terminated_statement_list terminated_statement
- ;
-
- unterminated_statement_list:
- unterminated_statement
- | terminated_statement_list unterminated_statement
- ;
-
- terminated_statement:
- action newline_opt
- | If '(' expr ')' newline_opt terminated_statement
- Else newline_opt terminated_statement
- | While '(' expr ')' newline_opt terminated_statement
- | For '(' simple_statement_opt ';' expr_opt ';' simple_statement_opt ')'
- newline_opt terminated_statement
- | For '(' NAME In NAME ')' newline_opt terminated_statement
- | ';' newline_opt
- | terminatable_statement NEWLINE newline_opt
- | terminatable_statement ';' newline_opt
- ;
-
- unterminated_statement:
- terminatable_statement
- | If '(' expr ')' newline_opt unterminated_statement
- | If '(' expr ')' newline_opt terminated_statement
- Else newline_opt unterminated_statement
- | While '(' expr ')' newline_opt unterminated_statement
- | For '(' simple_statement_opt ';' expr_opt ';' simple_statement_opt ')'
- newline_opt unterminated_statement
- | For '(' NAME In NAME ')' newline_opt unterminated_statement
- ;
-
- terminatable_statement:
- simple_statement
- | Break
- | Continue
- | Next
- | Exit expr_opt
- | Return expr_opt
- | Do newline_opt terminated_statement While '(' expr ')'
- ;
-
- simple_statement_opt:
- /* empty */
- | simple_statement
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 4.1 awk - Pattern scanning and processing language 341
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- ;
-
- simple_statement:
- Delete NAME '[' expr_list ']'
- | expr
- | print_statement
- ;
-
- print_statement:
- simple_print_statement
- | simple_print_statement output_redirection
- ;
-
- simple_print_statement:
- Print print_expr_list_opt
- | Print '(' multiple_expr_list ')'
- | Printf print_expr_list
- | Printf '(' multiple_expr_list ')'
- ;
-
- output_redirection:
- '>' expr
- | APPEND expr
- | '|' expr
- ;
-
- expr_list_opt:
- /* empty */
- | expr_list
- ;
-
- expr_list:
- expr
- | multiple_expr_list
- ;
-
- multiple_expr_list:
- expr ',' newline_opt expr
- | multiple_expr_list ',' newline_opt expr
- ;
-
- expr_opt:
- /* empty */
- | expr
- ;
-
- expr:
- unary_expr
- | non_unary_expr
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 342 4 Execution Environment Utilities
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- ;
-
- unary_expr:
- '+' expr
- | '-' expr
- | unary_expr '^' expr
- | unary_expr '*' expr
- | unary_expr '/' expr
- | unary_expr '%' expr
- | unary_expr '+' expr
- | unary_expr '-' expr
- | unary_expr non_unary_expr
- | unary_expr '<' expr
- | unary_expr LE expr
- | unary_expr NE expr
- | unary_expr EQ expr
- | unary_expr '>' expr
- | unary_expr GE expr
- | unary_expr ' ' expr
- | unary_expr N~O_MATCH expr
- | unary_expr In NAME
- | unary_expr AND newline_opt expr
- | unary_expr OR newline_opt expr
- | unary_expr '?' expr ':' expr
- | unary_input_function
- ;
-
- non_unary_expr:
- '(' expr ')'
- | '!' expr
- | non_unary_expr '^' expr
- | non_unary_expr '*' expr
- | non_unary_expr '/' expr
- | non_unary_expr '%' expr
- | non_unary_expr '+' expr
- | non_unary_expr '-' expr
- | non_unary_expr non_unary_expr
- | non_unary_expr '<' expr
- | non_unary_expr LE expr
- | non_unary_expr NE expr
- | non_unary_expr EQ expr
- | non_unary_expr '>' expr
- | non_unary_expr GE expr
- | non_unary_expr ' ' expr
- | non_unary_expr N~O_MATCH expr
- | non_unary_expr In NAME
- | '(' multiple_expr_list ')' In NAME
- | non_unary_expr AND newline_opt expr
- | non_unary_expr OR newline_opt expr
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 4.1 awk - Pattern scanning and processing language 343
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- | non_unary_expr '?' expr ':' expr
- | NUMBER
- | STRING
- | lvalue
- | ERE
- | lvalue INCR
- | lvalue DECR
- | INCR lvalue
- | DECR lvalue
- | lvalue POW_ASSIGN expr
- | lvalue MOD_ASSIGN expr
- | lvalue MUL_ASSIGN expr
- | lvalue DIV_ASSIGN expr
- | lvalue ADD_ASSIGN expr
- | lvalue SUB_ASSIGN expr
- | lvalue '=' expr
- | FUNC_NAME '(' expr_list_opt ')' /* no white space allowed */
- | BUILTIN_FUNC_NAME '(' expr_list_opt ')'
- | BUILTIN_FUNC_NAME
- | non_unary_input_function
- ;
-
- print_expr_list_opt:
- /* empty */
- | print_expr_list
- ;
-
- print_expr_list:
- print_expr
- | print_expr_list ',' newline_opt print_expr
- ;
-
- print_expr:
- unary_print_expr
- | non_unary_print_expr
- ;
-
- unary_print_expr:
- '+' print_expr
- | '-' print_expr
- | unary_print_expr '^' print_expr
- | unary_print_expr '*' print_expr
- | unary_print_expr '/' print_expr
- | unary_print_expr '%' print_expr
- | unary_print_expr '+' print_expr
- | unary_print_expr '-' print_expr
- | unary_print_expr non_unary_print_expr
- | unary_print_expr ' ' print_expr
- | unary_print_expr N~O_MATCH print_expr
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 344 4 Execution Environment Utilities
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- | unary_print_expr In NAME
- | unary_print_expr AND newline_opt print_expr
- | unary_print_expr OR newline_opt print_expr
- | unary_print_expr '?' print_expr ':' print_expr
- ;
-
- non_unary_print_expr:
- '(' expr ')'
- | '!' print_expr
- | non_unary_print_expr '^' print_expr
- | non_unary_print_expr '*' print_expr
- | non_unary_print_expr '/' print_expr
- | non_unary_print_expr '%' print_expr
- | non_unary_print_expr '+' print_expr
- | non_unary_print_expr '-' print_expr
- | non_unary_print_expr non_unary_print_expr
- | non_unary_print_expr ' ' print_expr
- | non_unary_print_expr N~O_MATCH print_expr
- | non_unary_print_expr In NAME
- | '(' multiple_expr_list ')' In NAME
- | non_unary_print_expr AND newline_opt print_expr
- | non_unary_print_expr OR newline_opt print_expr
- | non_unary_print_expr '?' print_expr ':' print_expr
- | NUMBER
- | STRING
- | lvalue
- | ERE
- | lvalue INCR
- | lvalue DECR
- | INCR lvalue
- | DECR lvalue
- | lvalue POW_ASSIGN print_expr
- | lvalue MOD_ASSIGN print_expr
- | lvalue MUL_ASSIGN print_expr
- | lvalue DIV_ASSIGN print_expr
- | lvalue ADD_ASSIGN print_expr
- | lvalue SUB_ASSIGN print_expr
- | lvalue '=' print_expr
- | FUNC_NAME '(' expr_list_opt ')' /* no white space allowed */
- | BUILTIN_FUNC_NAME '(' expr_list_opt ')'
- | BUILTIN_FUNC_NAME
- ;
-
- lvalue:
- NAME
- | NAME '[' expr_list ']'
- | '$' expr
- ;
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 4.1 awk - Pattern scanning and processing language 345
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- non_unary_input_function:
- simple_get
- | simple_get '<' expr
- | non_unary_expr '|' simple_get
- ;
-
- unary_input_function:
- unary_expr '|' simple_get
- ;
-
- simple_get:
- GETLINE
- | GETLINE lvalue
- ;
-
- newline_opt:
- /* empty */
- | newline_opt NEWLINE
- ;
-
- This grammar has several ambiguities that shall be resolved as follows:
-
- - Operator precedence and associativity shall be as described in
- Table 4-1.
-
- - In case of ambiguity, an else shall be associated with the most
- immediately preceding if that would satisfy the grammar.
-
-
- 4.1.7.8 awk Lexical Conventions
-
- The lexical conventions for awk programs, with respect to the preceding
- grammar, shall be as follows:
-
- (1) Except as noted, awk shall recognize the longest possible token
- or delimiter beginning at a given point.
-
- (2) A comment shall consist of any characters beginning with the
- number sign character and terminated by, but excluding the next
- occurrence of, a <newline> character. Comments shall have no
- effect, except to delimit lexical tokens.
-
- (3) The character <newline> shall be recognized as the token
- NEWLINE.
-
- (4) A backslash character immediately followed by a <newline> 1
- character shall have no effect. 1
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 346 4 Execution Environment Utilities
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- (5) The token STRING shall represent a string constant. A string
- constant shall begin with the character ". Within a string
- constant, a backslash character shall be considered to begin an
- escape sequence as specified in Table 2-15 (see 2.12). In
- addition, the escape sequences in Table 4-2 shall be recognized.
- A <newline> character shall not occur within a string constant.
- A string constant shall be terminated by the first unescaped
- occurrence of the character " after the one that begins the
- string constant. The value of the string shall be the sequence
- of all unescaped characters and values of escape sequences
- between, but not including, the two delimiting " characters.
-
-
- Table 4-2 - awk Escape Sequences
- __________________________________________________________________________________________________________________________________________________
- Escape
- Sequence Description Meaning
- _____________________________________________________________
-
- \" <backslash> <quotation-mark>
- <quotation-mark> character
-
- \/ <backslash> <slash> <slash> character
-
- \_d_d_d <backslash> followed The character whose 111
- by the longest encoding is represented 11
- sequence of one, two, by the one-, two-, or 11
- or three octal-digit three-digit octal 11
- characters (01234567). integer. If the size of 11
- If all of the digits a byte on the system is 11
- are 0, (i.e., greater than nine bits, 11
- representation of the the valid escape sequence 11
- NUL character), the used to represent a byte 11
- behavior is undefined. is implementation 11
- defined. Multibyte 1
- characters require 1
- multiple, concatenated 1
- escape sequences of this 1
- type, including the 1
- leading \ for each byte. 1
-
- \_c <backslash> followed Undefined
- by any character not
- described in this
- table or in Table 2-15
- __________________________________________________________________________________________________________________________________________________
-
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 4.1 awk - Pattern scanning and processing language 347
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- (6) The token ERE represents an extended regular expression
- constant. An ERE constant shall begin with the slash character.
- Within an ERE constant, a <backslash> character shall be
- considered to begin an escape sequence as specified in Table 2-
- 15 (see 2.12). In addition, the escape sequences in Table 4-2 1
- shall be recognized. A <newline> character shall not occur
- within an ERE constant. An ERE constant shall be terminated by
- the first unescaped occurrence of the slash character after the
- one that begins the string constant. The extended regular
- expression represented by the ERE constant shall be the sequence
- of all unescaped characters and values of escape sequences
- between, but not including, the two delimiting slash characters.
-
- (7) A <blank> shall have no effect, except to delimit lexical tokens
- or within STRING or ERE tokens.
-
- (8) The token NUMBER shall represent a numeric constant. Its form
- and numeric value shall be equivalent to the either of the
- tokens floating-constant or integer-constant as specified by the
- C Standard {7}, with the following exceptions:
-
- (a) An integer constant cannot begin with 0x or include the
- hexadecimal digits a, b, c, d, e, f, A, B, C, D, E, or F.
-
- (b) The value of an integer constant beginning with 0 shall be
- taken in decimal rather than octal.
-
- (c) An integer constant cannot include a suffix (u, U, l, or
- L).
-
- (d) A floating constant cannot include a suffix (f, F, l, or
- L).
-
- If the value is too large or too small to be representable (see
- 2.9.2.1), the behavior is undefined.
-
- (9) A sequence of underscores, digits, and alphabetics from the
- portable character set (see 2.4), beginning with an underscore
- or alphabetic, shall be considered a word.
-
- (10) The following words are keywords that shall be recognized as
- individual tokens; the name of the token is the same as the
- keyword:
-
- BEGIN delete for in printf
- END do function next return
- break else getline print while
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 348 4 Execution Environment Utilities
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- continue exit if
-
- (11) The following words are names of built-in functions and shall be
- recognized as the token BUILTIN_FUNC_NAME:
-
- atan2 index match sprintf substr
- close int rand sqrt system
- cos length sin srand tolower
- exp log split sub toupper
- gsub
-
- The above-listed keywords and names of built-in functions are
- considered reserved words.
-
- (12) The token NAME shall consist of a word that is not a keyword or
- a name of a built-in function and is not followed immediately
- (without any delimiters) by the ( character.
-
- (13) The token FUNC_NAME shall consist of a word that is not a
- keyword or a name of a built-in function, followed immediately
- (without any delimiters) by the ( character. The ( character
- shall not be included as part of the token.
-
- (14) The following two-character sequences shall be recognized as the
- named tokens:
-
- Token Name Sequence Token Name Sequence
- __________ ________ __________ ________
- ADD_ASSIGN += NO_MATCH !~
- SUB_ASSIGN -= EQ ==
- MUL_ASSIGN *= LE <=
- DIV_ASSIGN /= GE >=
- MOD_ASSIGN %= NE !=
- POW_ASSIGN ^= INCR ++
- OR || DECR --
- AND && APPEND >>
-
- (15) The following single characters shall be recognized as tokens
- whose names are the character:
-
- <newline> { } ( ) [ ] , ; + - * % ^ ! > < | ? : ~ $ =
-
- There is a lexical ambiguity between the token ERE and the tokens / and
- DIV_ASSIGN. When an input sequence begins with a slash character in any
- syntactic context where the token / or DIV_ASSIGN could appear as the
- next token in a valid program, the longer of those two tokens that can be
- recognized shall be recognized. In any other syntactic context where the
- token ERE could appear as the next token in a valid program, the token
- ERE shall be recognized.
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 4.1 awk - Pattern scanning and processing language 349
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- 4.1.8 Exit Status
-
- The awk utility shall exit with one of the following values:
-
- 0 All input files were processed successfully.
-
- >0 An error occurred.
-
- The exit status can be altered within the program by using an exit
- expression.
-
-
- 4.1.9 Consequences of Errors
-
- If any _f_i_l_e operand is specified and the named file cannot be accessed,
- awk shall write a diagnostic message to standard error and terminate
- without any further action.
-
- If the program specified by either the _p_r_o_g_r_a_m operand or the _p_r_o_g_f_i_l_e
- operand(s) is not a valid awk program (as specified in 4.1.7), the
- behavior is undefined.
-
- BEGIN_RATIONALE
-
-
- 4.1.10 Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2)
-
- _E_x_a_m_p_l_e_s_,__U_s_a_g_e
-
- The awk program specified in the command line is most easily specified
- within single-quotes (e.g., '_p_r_o_g_r_a_m') for applications using sh, because
- awk programs commonly contain characters that are special to the shell,
- including double-quotes. In the cases where an awk program contains
- single-quote characters, it is usually easiest to specify most of the
- program as strings within single-quotes concatenated by the shell with
- quoted single-quote characters. For example,
-
- awk '/'\''/ { print "quote:", $0 }'
-
- prints all lines from the standard input containing a single-quote
- character, prefixed with quote:.
-
- The following are examples of simple awk programs:
-
- (1) Write to the standard output all input lines for which field 3
- is greater than 5.
-
- $3 > 5
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 350 4 Execution Environment Utilities
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- (2) Write every tenth line.
-
- (NR % 10) == 0
-
- (3) Write any line with a substring matching the regular expression.
-
- /(G|D)(2[0-9][[:alpha:]]*)/
-
- (4) Write any line in which the second field matches the regular
- expression and the fourth field does not.
-
- $2 /xyz/ && $4 ! /xyz/
- ~ ~
- (5) Write any line in which the second field contains a backslash.
-
- $2 /\\/
- ~
- (6) Write any line in which the second field contains a backslash.
- Note that backslash escapes are interpreted twice, once in
- lexical processing of the string and once in processing the
- regular expression.
-
- $2 "\\\\"
- ~
- (7) Write the second to the last and the last field in each line.
- Separate the fields by a colon.
-
- {OFS=":";print $(NF-1), $NF}
-
- (8) Write the line number and number of fields in each line. The
- three strings representing the line number, the colon and the
- number of fields are concatenated and that string is written to
- standard output.
-
- {print NR ":" NF}
-
- (9) Write lines longer than 72 characters.
-
- {length($0) > 72}
-
- (10) Write first two fields in opposite order separated by the OFS:
-
- { print $2, $1 }
-
- (11) Same, with input fields separated by comma and/or <space>_s and
- <tab>_s:
-
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 4.1 awk - Pattern scanning and processing language 351
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- BEGIN { FS = ",[ \t]*|[ \t]+" }
- { print $2, $1 }
-
- (12) Add up first column, print sum and average.
-
- {s += $1 }
- END {print "sum is ", s, " average is", s/NR}
-
- (13) Write fields in reverse order, one per line (many lines out for
- each line in):
-
- { for (i = NF; i > 0; --i) print $i }
-
- (14) Write all lines between occurrences of the strings start and
- stop:
-
- /start/, /stop/
-
- (15) Write all lines whose first field is different from the previous
- one:
-
- $1 != prev { print; prev = $1 }
-
- (16) Simulate echo:
-
- BEGIN {
- for (i = 1; i < ARGC; ++i)
- printf "%s%s", ARGV[i], i==ARGC-1?"\n":""
- }
-
- (17) Write the path prefixes contained in the PATH environment
- variable, one per line:
-
- BEGIN {
- n = split (ENVIRON["PATH"], path, ":")
- for (i = 1; i <= n; ++i)
- print path[i]
- }
-
- (18) If there is a file named ``input'' containing page headers of
- the form:
-
- Page #
-
- and a file named ``program'' that contains:
-
- /Page/{ $2 = n++; }
- { print }
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 352 4 Execution Environment Utilities
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- then the command line:
-
- awk -f program n=5 input
-
- will print the file ``input,'' filling in page numbers starting
- at 5.
-
- The index, length, match, and substr should not be confused with similar
- functions in the C Standard {7}; the awk versions deal with characters,
- while the C Standard {7} deals with bytes.
-
- To forestall any possible confusion, where strings are used as the name 1
- of a file or pipeline, the strings must be textually identical. The 1
- terminology ``same string value'' implies that ``equivalent strings,'' 1
- even those that differ only by <space>s, represent different files. 1
-
- _H_i_s_t_o_r_y__o_f__D_e_c_i_s_i_o_n_s__M_a_d_e
-
- This description is based on the new awk, ``nawk,'' (see _T_h_e _A_W_K
- _P_r_o_g_r_a_m_m_i_n_g _L_a_n_g_u_a_g_e {B21}), which introduced a number of new features to
- the historical awk:
-
- (1) New keywords: delete, do, function, return
-
- (2) New built-in functions: atan2, cos, sin, rand, srand, gsub,
- sub, match, close, system
-
- (3) New predefined variables: FNR, ARGC, ARGV, RSTART, RLENGTH,
- SUBSEP
-
- (4) New expression operators: ?:, ^
-
- (5) The FS variable and the third argument to split are now treated
- as extended regular expressions.
-
- (6) The operator precedence has changed to more closely match C.
- Two examples of code that operate differently are:
-
- while ( n /= 10 > 1) ...
- if (!"wk" /bwk/) ...
- ~
- Several features have been added based on newer implementations of awk:
-
- (1) Multiple instances of -f _p_r_o_g_f_i_l_e are permitted.
-
- (2) New option: -v _a_s_s_i_g_n_m_e_n_t
-
- (3) New predefined variable: ENVIRON
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 4.1 awk - Pattern scanning and processing language 353
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- (4) New built-in functions: toupper, tolower
-
- (5) More formatting capabilities added to printf to match the
- C Standard {7}.
-
- Regular expressions have been extended somewhat from traditional
- implementations to make them a pure superset of Extended Regular
- Expressions as defined by this standard (see 2.8.4). The main extensions
- are internationalization features and interval expressions. Traditional
- implementations of awk have long supported <backslash> escape sequences
- as an extension to regular expressions, and this extension has been
- retained despite inconsistency with other utilities. The number of
- escape sequences recognized in both regular expressions and strings has
- varied (generally increasing with time) among implementations. The set
- specified by the standard includes most sequences known to be supported
- by popular implementations and by the C Standard {7}. One sequence that
- is not supported is hexadecimal value escapes beginning with "\x". This
- would allow values expressed in more than 9 bits to be used within awk as
- in the C Standard {7}. However, because this syntax has a
- nondeterministic length, it does not permit the subsequent character to
- be a hexadecimal digit. This limitation can be worked around in the
- C language by the use of lexical string concatenation. In the awk
- language, concatenation could also be a solution for strings, but not for
- regular expressions (either lexical ERE tokens or strings used
- dynamically as regular expressions). Because of this limitation, the
- feature has not been added to POSIX.2.
-
- When a string variable is used in a context where an ERE normally appears 1
- (where the lexical token ERE is used in the grammar) the string does not 1
- contain the literal slashes. 1
-
- Some versions of awk allow the form:
-
- func _n_a_m_e(_a_r_g_s,...) { _s_t_a_t_e_m_e_n_t_s }
-
- This has been deprecated by the language's authors, who have asked that
- it not be included in the standard.
-
- Traditional implementations of awk produce an error if a next statement
- is executed in a BEGIN action, and cause awk to terminate if a next
- statement is executed in an END action. This behavior has not been
- documented, and it was not believed that it was necessary to standardize
- it.
-
- The specification of conversions between string and numeric values is
- much more detailed than in the documentation of traditional
- implementations or in _T_h_e _A_W_K _P_r_o_g_r_a_m_m_i_n_g _L_a_n_g_u_a_g_e {B21}. Although most
- of the behavior is designed to be intuitive, the details are necessary to
- ensure compatible behavior from different implementations. This is
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 354 4 Execution Environment Utilities
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- especially important in relational expressions, since the types of the
- operands determine whether a string or numeric comparison is performed.
- From the perspective of an application writer, it is usually sufficient
- to expect intuitive behavior and to force conversions (by adding zero or
- concatenating a null string) when the type of an expression does not
- obviously match what is needed. The intent has been to specify existing
- practice in almost all cases. The one exception is that, in traditional
- implementations, variables and constants maintain both string and numeric
- values after their original value is converted by any use. This means
- that referencing a variable or constant can have unexpected side effects.
- For example, with traditional implementations the following program:
-
- {
- a = "+2"
- b = 2
- if (NR % 2)
- c = a + b
- if (a == b)
- print "numeric comparison"
- else
- print "string comparison"
- }
-
- would perform a numeric comparison (and output numeric comparison) for
- each odd-numbered line, but perform a string comparison (and output
- string comparison) for each even-numbered line. POSIX.2 ensures that 1
- comparisons will be numeric if necessary. With traditional 1
- implementations, the following program:
-
- BEGIN {
- OFMT = "%e"
- print 3.14
- OFMT = "%f"
- print 3.14
- }
-
- would output 3.140000e+00 twice, because in the second print statement
- the constant 3.14 would have a string value from the previous conversion.
- The standard requires that the output of the second print statement be
- 3.140000. The behavior of traditional implementations was seen as too
- unintuitive and unpredictable.
-
- However, a further modification was made in Draft 11. It was pointed out
- that with the Draft 10 rules, the following script would print nothing:
-
-
-
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 4.1 awk - Pattern scanning and processing language 355
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- BEGIN {
- y[1.5] = 1
- OFMT = "%e"
- print y[1.5]
- }
-
- Therefore, a new variable, CONVFMT, was introduced. The OFMT variable is
- now restricted to affecting output conversions of numbers to strings and
- CONVFMT is used for internal conversions, such as comparisons or array
- indexing. The default value is the same as that for OFMT, so unless a
- program changes CONVFMT (which no historical program would do), it will
- receive the historical behavior associated with internal string
- conversions.
-
- The POSIX awk lexical and syntactic conventions are specified more
- formally than in other sources. Again the intent has been to specify
- existing practice. One convention that may not be obvious from the
- formal grammar as in other verbal descriptions is where <newline>_s are
- acceptable. There are several obvious placements such as terminating a
- statement, and a backslash can be used to escape <newline>_s between any
- lexical tokens. In addition, <newline>_s without backslashes can follow a
- comma, an open brace, logical AND operator (&&), _l_o_g_i_c_a_l _O_R _o_p_e_r_a_t_o_r
- (||), the do keyword, the else keyword, and the closing parenthesis of an
- if, for, or while statement. For example:
-
- { print $1,
- $2 }
-
- The requirement that awk add a trailing <newline> to the _p_r_o_g_r_a_m argument
- text is to simplify the grammar, making it match a text file in form.
- There is no way for an application or test suite to determine whether a
- literal <newline> is added or whether awk simply acts as if it did.
-
- Because the concatenation operation is represented by adjacent
- expressions rather than an explicit operator, it is often necessary to
- use parentheses to enforce the proper evaluation precedence.
-
- The overall awk syntax has always been based on the C language, with a
- few features from the shell command language and other sources. Because
- of this, it is not completely compatible with any other language, which
- has caused confusion for some users. It is not the intent of this
- standard to address such issues. The standard has made a few relatively
- minor changes toward making the language more compatible with the
- C language as specified by the C Standard {7}; most of these changes are
- based on similar changes in recent implementations, as described above.
- There remain several C language conventions that are not in _a_w_k. One of
- the notable ones is the comma operator, which is commonly used to specify
- multiple expressions in the C language for statement. Also, there are
- various places where awk is more restrictive than the C language
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 356 4 Execution Environment Utilities
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- regarding the type of expression that can be used in a given context.
- These limitations are due to the different features that the awk language
- does provide.
-
- This standard requires several changes from traditional implementations
- in order to support internationalization. Probably the most subtle of
- these is the use of the decimal-point character, defined by the
- LC_NUMERIC category of the locale, in representations of floating point
- numbers. This locale-specific character is used in recognizing numeric
- input, in converting between strings and numeric values, and in
- formatting output. However, regardless of locale, the period character
- (the decimal-point character of the POSIX Locale) is the decimal-point
- character recognized in processing awk programs (including assignments in
- command-line arguments). This is essentially the same convention as the
- one used in the C Standard {7}. The difference is that the C language
- includes the _s_e_t_l_o_c_a_l_e() function, which permits an application to modify
- its locale. Because of this capability, a C application begins executing
- with its locale set to the C locale, and only executes in the
- environment-specified locale after an explicit call to _s_e_t_l_o_c_a_l_e().
- However, adding such an elaborate new feature to the awk language was
- seen as inappropriate for POSIX.2. It is possible to explicitly execute
- an awk program in any desired locale by setting the environment in the
- shell.
-
- The behavior in the case of invalid awk programs (including lexical,
- syntactic, and semantic errors) is undefined because it was considered
- overly limiting on implementations to specify. In most cases such errors
- can be expected to produce a diagnostic and a nonzero exit status.
- However, some implementations may choose to extend the language in ways
- that make use of certain invalid constructs. Other invalid constructs
- might be deemed worthy of a warning but otherwise cause some reasonable
- behavior. Still other constructs may be very difficult to detect in some
- implementations. Also, different implementations might detect a given
- error during an initial parsing of the program (before reading any input
- files) while others might detect it when executing the program after
- reading some input. Implementors should be aware that diagnosing errors
- as early as possible and producing useful diagnostics can ease debugging
- of applications, and thus make an implementation more usable.
-
- The unspecified behavior from using multicharacter RS values is to allow
- possible future extensions based on regular expressions used for record
- separators. Historical implementations take the first character of the
- string and ignore the others.
-
- The undefined behavior resulting from NULs in regular expressions allows
- future extensions for the GNU gawk program to process binary data.
-
- Unspecified behavior when split(string,array,<null>) is used is to allow
- a proposed future extension that would split up a string into an array of
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 4.1 awk - Pattern scanning and processing language 357
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- individual characters.
-
- END_RATIONALE
-
-
-
- 4.2 basename - Return nondirectory portion of pathname
-
-
- 4.2.1 Synopsis
-
- basename _s_t_r_i_n_g [_s_u_f_f_i_x]
-
-
- 4.2.2 Description
-
- The _s_t_r_i_n_g operand shall be treated as a pathname, as defined in
- 2.2.2.102. The string _s_t_r_i_n_g shall be converted to the filename
- corresponding to the last pathname component in _s_t_r_i_n_g and then the
- suffix string _s_u_f_f_i_x, if present, shall be removed. This shall be done
- by performing actions equivalent to the following steps in order:
-
- (1) If _s_t_r_i_n_g is //, it is implementation defined whether steps (2)
- through (5) are skipped or processed.
-
- (2) If _s_t_r_i_n_g consists entirely of slash characters, _s_t_r_i_n_g shall be
- set to a single slash character. In this case, skip steps (3)
- through (5).
-
- (3) If there are any trailing slash characters in _s_t_r_i_n_g, they shall
- be removed.
-
- (4) If there are any slash characters remaining in _s_t_r_i_n_g, the
- prefix of _s_t_r_i_n_g up to and including the last slash character in
- _s_t_r_i_n_g shall be removed.
-
- (5) If the _s_u_f_f_i_x operand is present, is not identical to the
- characters remaining in _s_t_r_i_n_g, and is identical to a suffix of
- the characters remaining in _s_t_r_i_n_g, the suffix _s_u_f_f_i_x shall be
- removed from _s_t_r_i_n_g. Otherwise, _s_t_r_i_n_g shall not be modified by
- this step. It shall not be considered an error if _s_u_f_f_i_x is not
- found in _s_t_r_i_n_g.
-
- The resulting string shall be written to standard output.
-
-
-
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 358 4 Execution Environment Utilities
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- 4.2.3 Options
-
- None.
-
-
- 4.2.4 Operands
-
- The following operands shall be supported by the implementation:
-
- _s_t_r_i_n_g A string.
-
- _s_u_f_f_i_x A string.
-
-
- 4.2.5 External Influences
-
-
- 4.2.5.1 Standard Input
-
- None.
-
- 4.2.5.2 Input Files
-
- None.
-
-
- 4.2.5.3 Environment Variables
-
- The following environment variables shall affect the execution of
- basename:
-
- LANG This variable shall determine the locale to use for
- the locale categories when both LC_ALL and the
- corresponding environment variable (beginning with
- LC_) do not specify a locale. See 2.6.
-
- LC_ALL This variable shall determine the locale to be used
- to override any values for locale categories
- specified by the settings of LANG or any
- environment variables beginning with LC_.
-
- LC_CTYPE This variable shall determine the locale for the
- interpretation of sequences of bytes of text data
- as characters (e.g., single- versus multibyte
- characters in arguments).
-
- LC_MESSAGES This variable shall determine the language in which
- messages should be written.
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 4.2 basename - Return nondirectory portion of pathname 359
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- 4.2.5.4 Asynchronous Events
-
- Default.
-
-
- 4.2.6 External Effects
-
-
- 4.2.6.1 Standard Output
-
- The basename utility shall write a line to the standard output in the
- following format:
-
- "%s\n", <_r_e_s_u_l_t_i_n_g _s_t_r_i_n_g>
-
- 4.2.6.2 Standard Error
-
- Used only for diagnostic messages.
-
-
- 4.2.6.3 Output Files
-
- None.
-
-
- 4.2.7 Extended Description
-
- None.
-
-
- 4.2.8 Exit Status
-
- The basename utility shall exit with one of the following values:
-
- 0 Successful completion.
-
- >0 An error occurred.
-
-
- 4.2.9 Consequences of Errors
-
- Default.
-
- BEGIN_RATIONALE
-
-
-
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 360 4 Execution Environment Utilities
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- 4.2.10 Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2)
-
- _E_x_a_m_p_l_e_s_,__U_s_a_g_e
-
- If the string _s_t_r_i_n_g is a valid pathname,
-
- $(basename "string")
-
- produces a filename that could be used to open the file named by _s_t_r_i_n_g
- in the directory returned by
-
- $(dirname "string")
-
- If the string _s_t_r_i_n_g is not a valid pathname, the same algorithm is used,
- but the result need not be a valid filename. The basename utility is not
- expected to make any judgements about the validity of _s_t_r_i_n_g as a
- pathname; it just follows the specified algorithm to produce a result
- string.
-
- The following shell script compiles /usr/src/cmd/cat.c and moves the
- output to a file named cat in the current directory when invoked with the
- argument /usr/src/cmd/cat or with the argument /usr/src/cmd/cat.c:
-
- c89 $(dirname "$1")/$(basename "$1" .c).c
- mv a.out $(basename "$1" .c)
-
- _H_i_s_t_o_r_y__o_f__D_e_c_i_s_i_o_n_s__M_a_d_e
-
- The POSIX.1 {8} definition of pathname allows trailing slashes on a
- pathname naming a directory. Some historical implementations have not
- allowed trailing slashes and thus treated pathnames of this form in other
- ways. Existing implementations also differ in their handling of _s_u_f_f_i_x
- when _s_u_f_f_i_x matches the entire string left after removing the directory
- part of _s_t_r_i_n_g.
-
- The behaviors of basename and dirname in this standard have been
- coordinated so that when _s_t_r_i_n_g is a valid pathname
-
- $(basename "string")
-
- would be a valid filename for the file in the directory
-
- $(dirname "string")
-
- This would not work for the versions of these utilities in earlier drafts
- due to the way it specified handling of trailing slashes.
-
- Since the definition of _p_a_t_h_n_a_m_e in 2.2.2.102 specifies implementation-
- defined behavior for pathnames starting with two slash characters, Draft
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 4.2 basename - Return nondirectory portion of pathname 361
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- 11 has been changed to specify similar implementation-defined behavior
- for the basename and dirname utilities. On implementations where the
- pathname // is always treated the same as the pathname /, the
- functionality required by Draft 10 meets all of the Draft 11
- requirements.
-
- END_RATIONALE
-
-
-
- 4.3 bc - Arbitrary-precision arithmetic language
-
-
- 4.3.1 Synopsis
-
- bc [-l] [_f_i_l_e ...]
-
-
- 4.3.2 Description
-
- The bc utility shall implement an arbitrary precision calculator. It
- shall take input from any files given, then read from the standard input.
- If the standard input and standard output to bc are attached to a
- terminal, the invocation of bc shall be considered to be _i_n_t_e_r_a_c_t_i_v_e,
- causing behavioral constraints described in the following subclauses.
-
-
- 4.3.3 Options
-
- The bc utility shall conform to the utility argument syntax guidelines
- described in 2.10.2.
-
- The following option shall be supported by the implementation:
-
- -l (The letter ell.) Define the math functions and
- initialize scale to 20, instead of the default zero. See
- 4.3.7.
-
-
- 4.3.4 Operands
-
- The following operands shall be supported by the implementation:
-
- _f_i_l_e A pathname of a text file containing bc program
- statements. After all _f_i_l_es have been read, bc shall read
- the standard input.
-
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 362 4 Execution Environment Utilities
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- 4.3.5 External Influences
-
-
- 4.3.5.1 Standard Input
-
- See Input Files.
-
- 4.3.5.2 Input Files
-
- Input files shall be text files containing a sequence of comments,
- statements, and function definitions that shall be executed as they are
- read.
-
-
- 4.3.5.3 Environment Variables
-
- The following environment variables shall affect the execution of bc:
-
- LANG This variable shall determine the locale to use for
- the locale categories when both LC_ALL and the
- corresponding environment variable (beginning with
- LC_) do not specify a locale. See 2.6.
-
- LC_ALL This variable shall determine the locale to be used
- to override any values for locale categories
- specified by the settings of LANG or any
- environment variables beginning with LC_.
-
- LC_CTYPE This variable shall determine the locale for the
- interpretation of sequences of bytes of text data
- as characters (e.g., single- versus multibyte
- characters in arguments and input files).
-
- LC_MESSAGES This variable shall determine the language in which
- messages should be written.
-
- 4.3.5.4 Asynchronous Events
-
- Default.
-
-
- 4.3.6 External Effects
-
-
-
-
-
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 4.3 bc - Arbitrary-precision arithmetic language 363
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- 4.3.6.1 Standard Output
-
- The output of the bc utility shall be controlled by the program read, and
- shall consist of zero or more lines containing the value of all executed 2
- expressions without assignments. The radix and precision of the output 2
- shall be controlled by the values of the obase and scale variables. See
- 4.3.7.
-
-
- 4.3.6.2 Standard Error
-
- Used only for diagnostic messages.
-
- 4.3.6.3 Output Files
-
- None.
-
-
- 4.3.7 Extended Description
-
-
- 4.3.7.1 bc Grammar
-
- The grammar in this subclause and the lexical conventions in the
- following subclause shall together describe the syntax for bc programs.
- The general conventions for this style of grammar are described in 2.1.2.
- A valid program can be represented as the nonterminal symbol program in
- the grammar. Any discrepancies found between this grammar and other
- descriptions in this subclause (4.3.7) shall be resolved in favor of this
- grammar.
-
- %token EOF NEWLINE STRING LETTER NUMBER
-
- %token MUL_OP
- /* '*', '/', '%' */
-
- %token ASSIGN_OP
- /* '=', '+=', '-=', '*=', '/=', '%=', '^=' */
-
- %token REL_OP
- /* '==', '<=', '>=', '!=', '<', '>' */
-
- %token INCR_DECR
- /* '++', '--' */
-
- %token Define Break Quit Length
- /* 'define', 'break', 'quit', 'length' */
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 364 4 Execution Environment Utilities
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- %token Return For If While Sqrt
- /* 'return', 'for', 'if', 'while', 'sqrt' */
-
- %token Scale Ibase Obase Auto
- /* 'scale', 'ibase', 'obase', 'auto' */
-
- %start program
-
- %%
-
- program : EOF
- | input_item program
- ;
-
- input_item : semicolon_list NEWLINE
- | function
- ;
-
- semicolon_list : /* empty */
- | statement
- | semicolon_list ';' statement
- | semicolon_list ';'
- ;
-
- statement_list : /* empty */
- | statement
- | statement_list NEWLINE
- | statement_list NEWLINE statement
- | statement_list ';'
- | statement_list ';' statement
- ;
-
- statement : expression
- | STRING
- | Break
- | Quit
- | Return
- | Return '(' return_expression ')'
- | For '(' expression ';'
- relational_expression ';'
- expression ')' statement
- | If '(' relational_expression ')' statement
- | While '(' relational_expression ')' statement
- | '{' statement_list '}'
- ;
-
- function : Define LETTER '(' opt_parameter_list ')'
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 4.3 bc - Arbitrary-precision arithmetic language 365
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- '{' NEWLINE opt_auto_define_list
- statement_list '}'
- ;
-
- opt_parameter_list : /* empty */
- | parameter_list
- ;
-
- parameter_list : LETTER
- | define_list ',' LETTER
- ;
-
- opt_auto_define_list : /* empty */
- | Auto define_list NEWLINE
- | Auto define_list ';'
- ;
-
- define_list : LETTER
- | LETTER '[' ']'
- | define_list ',' LETTER
- | define_list ',' LETTER '[' ']'
- ;
-
- opt_argument_list : /* empty */
- | argument_list
- ;
-
- argument_list : expression
- | argument_list ',' expression
- ;
-
- relational_expression : expression
- | expression REL_OP expression
- ;
-
- return_expression : /* empty */
- | expression
- ;
-
- expression : named_expression
- | NUMBER
- | '(' expression ')'
- | LETTER '(' opt_argument_list ')'
- | '-' expression
- | expression '+' expression 1
- | expression '-' expression 1
- | expression MUL_OP expression
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 366 4 Execution Environment Utilities
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- | expression '^' expression
- | INCR_DECR named_expression
- | named_expression INCR_DECR
- | named_expression ASSIGN_OP expression
- | Length '(' expression ')'
- | Sqrt '(' expression ')'
- | Scale '(' expression ')'
- ;
-
- named_expression : LETTER
- | LETTER '[' expression ']'
- | Scale
- | Ibase
- | Obase
- ;
-
-
- 4.3.7.2 bc Lexical Conventions
-
- The lexical conventions for bc programs, with respect to the preceding
- grammar, shall be as follows:
-
- (1) Except as noted, bc shall recognize the longest possible token
- or delimiter beginning at a given point.
-
- (2) A comment shall consist of any characters beginning with the two
- adjacent characters /* and terminated by the next occurrence of
- the two adjacent characters */. Comments shall have no effect
- except to delimit lexical tokens.
-
- (3) The character <newline> shall be recognized as the token
- NEWLINE.
-
- (4) The token STRING shall represent a string constant; it shall
- consist of any characters beginning with the double-quote
- character (") and terminated by another occurrence of the
- double-quote character. The value of the string shall be the
- sequence of all characters between, but not including, the two
- double-quote characters. All characters shall be taken
- literally from the input, and there is no way to specify a
- string containing a double-quote character. The length of the
- value of each string shall be limited to {BC_STRING_MAX} bytes.
-
- (5) A <blank> shall have no effect except as an ordinary character 1
- if it appears within a STRING token, or to delimit a lexical 1
- token other than STRING. 1
-
- (6) The combination of a backslash character immediately followed by 2
- a <newline> character shall delimit lexical tokens with the 2
- following exceptions: 2
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 4.3 bc - Arbitrary-precision arithmetic language 367
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- - It shall be interpreted as a literal <newline> in STRING 2
- tokens. 2
-
- - It shall be ignored as part of a multiline NUMBER token. 2
-
- (7) The token NUMBER shall represent a numeric constant. It shall
- be recognized by the following grammar:
-
- NUMBER : integer
- | '.' integer
- | integer '.'
- | integer '.' integer
- ;
-
- integer : digit
- | integer digit
- ;
-
- digit : 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7
- | 8 | 9 | A | B | C | D | E | F
- ;
-
- (8) The value of a NUMBER token shall be interpreted as a numeral in
- the base specified by the value of the internal register ibase
- (described below). Each of the digit characters shall have the
- value from 0 to 15 in the order listed here, and the period
- character shall represent the radix point. The behavior is
- undefined if digits greater than or equal to the value of ibase
- appear in the token. (However, note the exception for single-
- digit values being assigned to ibase and obase themselves, in
- 4.3.7.3).
-
- (9) The following keywords shall be recognized as tokens:
-
- auto for length return sqrt
- break ibase obase scale while
- define if quit
-
- (10) Any of the following characters occurring anywhere except within
- a keyword shall be recognized as the token LETTER:
-
- a b c d e f g h i j k l m n o p q r s t u v w x y z
-
- (11) The following single-character and two-character sequences shall
- be recognized as the token ASSIGN_OP:
-
- = += -= *= /= %= ^=
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 368 4 Execution Environment Utilities
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- (12) If an = character, as the beginning of a token, is followed by a
- - character with no intervening delimiter, the behavior is
- undefined.
-
- (13) The following single-characters shall be recognized as the token
- MUL_OP:
-
- * / %
-
- (14) The following single-character and two-character sequences shall
- be recognized as the token REL_OP:
-
- == <= >= != < >
-
- (15) The following two-character sequences shall be recognized as the
- token INCR_DECR:
-
- ++ --
-
- (16) The following single characters shall be recognized as tokens
- whose names are the character:
-
- <newline> ( ) , + - ; [ ] ^ { } 1
-
- (17) The token EOF shall be returned when the end of input is
- reached.
-
-
- 4.3.7.3 bc Operations
-
- There are three kinds of identifiers: ordinary identifiers, array
- identifiers, and function identifiers. All three types consist of single
- lowercase letters. Array identifiers shall be followed by square
- brackets ([ ]). An array subscript is required except in an argument or
- auto list. Arrays are singly dimensioned and can contain up to
- {BC_DIM_MAX} elements. Indexing begins at zero so an array is indexed
- from 0 to {BC_DIM_MAX}-1. Subscripts shall be truncated to integers.
- Function identifiers shall be followed by parentheses, possibly enclosing
- arguments. The three types of identifiers do not conflict.
-
- Table 4-3 summarizes the rules for precedence and associativity of all
- operators. Operators on the same line shall have the same precedence;
- rows are in order of decreasing precedence.
-
- Each expression or named expression has a _s_c_a_l_e, which is the number of
- decimal digits that shall be maintained as the fractional portion of the
- expression.
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 4.3 bc - Arbitrary-precision arithmetic language 369
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
-
- Table 4-3 - bc Operators
- __________________________________________________________________________________________________________________________________________________
- Operator Associativity
- ____________________________________________________________
- ++, -- not applicable
- unary - not applicable
- ^ right to left
- *, /, % left to right
- +, binary - left to right
- =, +=, -=, *=, /=, %=, ^= right to left
- ==, <=, >=, !=, <, > none
- __________________________________________________________________________________________________________________________________________________
-
-
- _N_a_m_e_d _e_x_p_r_e_s_s_i_o_n_s are places where values are stored. Named expressions
- shall be valid on the left side of an assignment. The value of a named
- expression shall be the value stored in the place named. Simple
- identifiers and array elements shall be named expressions; they shall
- have an initial value of zero and an initial scale of zero.
-
- The internal registers scale, _i_b_a_s_e, and obase are all named expressions.
- The scale of an expression consisting of the name of one of these
- registers shall be zero; values assigned to any of these registers shall
- be truncated to integers. The scale register shall contain a global
- value used in computing the scale of expressions (as described below).
- The value of the register scale shall be limited to 0 _< scale _<
- {BC_SCALE_MAX} and shall have a default value of zero. The ibase and
- obase registers are the input and output number radix, respectively. The
- value of ibase shall be limited to
-
- 2 _< ibase _< 16
-
- The value of obase shall be limited to
-
- 2 _< obase _< {BC_BASE_MAX}
-
- When either ibase or obase is assigned a single digit value from the list
- in 4.3.7.2, the value shall be assumed in hexadecimal. (For example,
- ibase=A sets to base ten, regardless of the current ibase value.)
- Otherwise, the behavior is undefined when digits greater than or equal to
- the value of ibase appear in the input. Both ibase and obase shall have
- initial values of 10.
-
- Internal computations shall be conducted as if in decimal, regardless of 1
- the input and output bases, to the specified number of decimal digits.
- When an exact result is not achieved, (e.g., scale=0; 3.2/1) the result
- shall be truncated.
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 370 4 Execution Environment Utilities
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- For all values of obase specified by this standard, numerical values
- shall be output as follows:
-
- (1) If the value is less than zero, a hyphen (-) character shall be
- output.
-
- (2) One of the following shall be output, depending on the numerical
- value:
-
- - If the absolute value of the numerical value is greater than
- or equal to one, the integer portion of the value shall be
- output as a series of digits appropriate to obase (as
- described below). The most significant nonzero digit shall
- be output next, followed by each successively less
- significant digit.
-
- - If the absolute value of the numerical value is less than one
- but greater than zero and the scale of the numerical value is
- greater than zero, it is unspecified whether the character 0
- is output.
-
- - If the numerical value is zero, the character 0 shall be
- output.
-
- (3) If the scale of the value is greater than zero, a period
- character shall be output, followed by a series of digits
- appropriate to obase (as described below) representing the most
- significant portion of the fractional part of the value. If _s
- represents the scale of the value being output, the number of
- digits output shall be _s if obase is 10, less than or equal to _s
- if obase is greater than 10, or greater than or equal to _s if
- obase is less than 10. For obase values other than 10, this
- should be the number of digits needed to represent a precision
- of 10_s.
-
- For obase values from 2 to 16, valid digits are the first obase of the
- single characters
-
- 0 1 2 3 4 5 6 7 8 9 A B C D E F
-
- which represent the values zero through fifteen, respectively.
-
- For bases greater than 16, each ``digit'' shall be written as a separate
- multidigit decimal number. Each digit except the most significant
- fractional digit shall be preceded a single <space> character. For bases
- from 17 to 100, bc shall write two-digit decimal numbers; for bases from
- 101 to 999, three-digit decimal strings, and so on. For example, the
- decimal number 1024 in base 25 would be written as:
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 4.3 bc - Arbitrary-precision arithmetic language 371
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- W01W15W24
-
- in base 125, as:
-
- W008W024
-
- Very large numbers shall be split across lines with 70 characters per
- line in the POSIX Locale; other locales may split at different character
- boundaries. Lines that are continued shall end with a backslash (\).
-
- A function call shall consist of a function name followed by parentheses
- containing a comma-separated list of expressions, which are the function
- arguments. A whole array passed as an argument shall be specified by the
- array name followed by empty square brackets. All function arguments
- shall be passed by value. As a result, changes made to the formal
- parameters have no effect on the actual arguments. If the function
- terminates by executing a return statement, the value of the function
- shall be the value of the expression in the parentheses of the return
- statement or shall be zero if no expression is provided or if there is no
- return statement.
-
- The result of sqrt(_e_x_p_r_e_s_s_i_o_n) _s_h_a_l_l _b_e _t_h_e _s_q_u_a_r_e _r_o_o_t _o_f _t_h_e
- _e_x_p_r_e_s_s_i_o_n. _T_h_e _r_e_s_u_l_t _s_h_a_l_l _b_e _t_r_u_n_c_a_t_e_d _i_n _t_h_e _l_e_a_s_t _s_i_g_n_i_f_i_c_a_n_t
- _d_e_c_i_m_a_l _p_l_a_c_e. _T_h_e _s_c_a_l_e _o_f _t_h_e _r_e_s_u_l_t _s_h_a_l_l _b_e _t_h_e _s_c_a_l_e _o_f _t_h_e
- _e_x_p_r_e_s_s_i_o_n _o_r _t_h_e _v_a_l_u_e _o_f _s_c_a_l_e, whichever is larger.
-
- The result of length(_e_x_p_r_e_s_s_i_o_n) _s_h_a_l_l _b_e _t_h_e _t_o_t_a_l _n_u_m_b_e_r _o_f _s_i_g_n_i_f_i_c_a_n_t
- _d_e_c_i_m_a_l _d_i_g_i_t_s _i_n _t_h_e _e_x_p_r_e_s_s_i_o_n. _T_h_e _s_c_a_l_e _o_f _t_h_e _r_e_s_u_l_t _s_h_a_l_l _b_e _z_e_r_o.
-
- _T_h_e _r_e_s_u_l_t _o_f _s_c_a_l_e(_e_x_p_r_e_s_s_i_o_n) _s_h_a_l_l _b_e _t_h_e _s_c_a_l_e _o_f _t_h_e _e_x_p_r_e_s_s_i_o_n.
- _T_h_e _s_c_a_l_e _o_f _t_h_e _r_e_s_u_l_t _s_h_a_l_l _b_e _z_e_r_o.
-
- _A _n_u_m_e_r_i_c _c_o_n_s_t_a_n_t _s_h_a_l_l _b_e _a_n _e_x_p_r_e_s_s_i_o_n. _T_h_e _s_c_a_l_e _s_h_a_l_l _b_e _t_h_e _n_u_m_b_e_r
- _o_f _d_i_g_i_t_s _t_h_a_t _f_o_l_l_o_w _t_h_e _r_a_d_i_x _p_o_i_n_t _i_n _t_h_e _i_n_p_u_t _r_e_p_r_e_s_e_n_t_i_n_g _t_h_e
- _c_o_n_s_t_a_n_t, _o_r _z_e_r_o _i_f _n_o _r_a_d_i_x _p_o_i_n_t _a_p_p_e_a_r_s.
-
- _T_h_e _s_e_q_u_e_n_c_e ( _e_x_p_r_e_s_s_i_o_n ) _s_h_a_l_l _b_e _a_n _e_x_p_r_e_s_s_i_o_n _w_i_t_h _t_h_e _s_a_m_e _v_a_l_u_e
- _a_n_d _s_c_a_l_e _a_s _e_x_p_r_e_s_s_i_o_n. The parentheses can be used to alter the normal
- precedence.
-
- The semantics of the unary and binary operators are as follows.
-
- -_e_x_p_r_e_s_s_i_o_n
- The result shall be the negative of the _e_x_p_r_e_s_s_i_o_n. The
- scale of the result shall be the scale of _e_x_p_r_e_s_s_i_o_n.
-
- The unary increment and decrement operators shall not modify the scale of
- the named expression upon which they operate. The scale of the result
- shall be the scale of that named expression.
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 372 4 Execution Environment Utilities
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- ++_n_a_m_e_d-_e_x_p_r_e_s_s_i_o_n
- The named expression shall be incremented by one. The result
- shall be the value of the named expression after
- incrementing.
-
- --_n_a_m_e_d-_e_x_p_r_e_s_s_i_o_n
- The named expression shall be decremented by one. The result
- shall be the value of the named expression after
- decrementing.
-
- _n_a_m_e_d-_e_x_p_r_e_s_s_i_o_n++
- The named expression shall be incremented by one. The result
- shall be the value of the named expression before
- incrementing.
-
- _n_a_m_e_d-_e_x_p_r_e_s_s_i_o_n--
- The named expression shall be decremented by one. The result
- shall be the value of the named expression before
- decrementing.
-
- The exponentiation operator, circumflex (^), shall bind right to left.
-
- _e_x_p_r_e_s_s_i_o_n^_e_x_p_r_e_s_s_i_o_n
- The result shall be the first _e_x_p_r_e_s_s_i_o_n raised to the power
- of the second _e_x_p_r_e_s_s_i_o_n. If the second expression is not an
- integer, the behavior is undefined. If a is the scale of the
- left expression and b is the absolute value of the right
- expression, the scale of the result shall be:
-
- if b >= 0 min(a * b, max(scale, a)) 2
- if b < 0 scale 2
-
- The multiplicative operators (*, /, %) shall bind left to right.
-
- _e_x_p_r_e_s_s_i_o_n * _e_x_p_r_e_s_s_i_o_n
- The result shall be the product of the two expressions. If a
- and b are the scales of the two expressions, then the scale
- of the result shall be:
-
- min(a+b,max(scale,a,b))
-
- _e_x_p_r_e_s_s_i_o_n / _e_x_p_r_e_s_s_i_o_n
- The result shall be the quotient of the two expressions. The
- scale of the result shall be the value of scale.
-
- _e_x_p_r_e_s_s_i_o_n % _e_x_p_r_e_s_s_i_o_n
- _F_o_r _e_x_p_r_e_s_s_i_o_n_s _a and _b, a % b shall be evaluated equivalent
- to the steps:
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 4.3 bc - Arbitrary-precision arithmetic language 373
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- (1) Compute a/b to current scale.
-
- (2) Use the result to compute
-
- a - (a / b) * b
-
- to scale
-
- max(scale + scale(b), scale(a))
-
- The scale of the result shall be
-
- max(scale + scale(b), scale(a))
-
- The additive operators (+, -) shall bind left to right.
-
- _e_x_p_r_e_s_s_i_o_n + _e_x_p_r_e_s_s_i_o_n
- The result shall be the sum of the two expressions. The
- scale of the result shall be the maximum of the scales of the
- expressions.
-
- _e_x_p_r_e_s_s_i_o_n - _e_x_p_r_e_s_s_i_o_n
- The result shall be the difference of the two expressions.
- The scale of the result shall be the maximum of the scales of
- the expressions.
-
- The assignment operators (=, +=, -=, *=, /=, %=, ^=) shall bind right to
- left.
-
- _n_a_m_e_d-_e_x_p_r_e_s_s_i_o_n = _e_x_p_r_e_s_s_i_o_n
- This expression results in assigning the value of the
- expression on the right to the named expression on the left.
- The scale of both the named expression and the result shall
- be the scale of _e_x_p_r_e_s_s_i_o_n.
-
- The compound assignments forms
-
- _n_a_m_e_d-_e_x_p_r_e_s_s_i_o_n <_o_p_e_r_a_t_o_r>= _e_x_p_r_e_s_s_i_o_n
-
- shall be equivalent to:
-
- _n_a_m_e_d-_e_x_p_r_e_s_s_i_o_n = _n_a_m_e_d-_e_x_p_r_e_s_s_i_o_n <_o_p_e_r_a_t_o_r> _e_x_p_r_e_s_s_i_o_n
-
- except that the _n_a_m_e_d-_e_x_p_r_e_s_s_i_o_n shall be evaluated only once.
-
- Unlike all other operators, the relational operators (<, >, <=, >=, ==,
- !=) shall be only valid as the object of an if, while, or inside a for
- statement.
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 374 4 Execution Environment Utilities
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- _e_x_p_r_e_s_s_i_o_n_1 < _e_x_p_r_e_s_s_i_o_n_2
- The relation shall be true if the value of _e_x_p_r_e_s_s_i_o_n_1 is
- strictly less than the value of _e_x_p_r_e_s_s_i_o_n_2.
-
- _e_x_p_r_e_s_s_i_o_n_1 > _e_x_p_r_e_s_s_i_o_n_2
- The relation shall be true if the value of _e_x_p_r_e_s_s_i_o_n_1 is
- strictly greater than the value of _e_x_p_r_e_s_s_i_o_n_2.
-
- _e_x_p_r_e_s_s_i_o_n_1 <= _e_x_p_r_e_s_s_i_o_n_2
- The relation shall be true if the value of _e_x_p_r_e_s_s_i_o_n_1 is
- less than or equal to the value of _e_x_p_r_e_s_s_i_o_n_2.
-
- _e_x_p_r_e_s_s_i_o_n_1 >= _e_x_p_r_e_s_s_i_o_n_2
- The relation shall be true if the value of _e_x_p_r_e_s_s_i_o_n_1 is
- greater than or equal to the value of _e_x_p_r_e_s_s_i_o_n_2.
-
- _e_x_p_r_e_s_s_i_o_n_1 == _e_x_p_r_e_s_s_i_o_n_2
- The relation shall be true if the values of _e_x_p_r_e_s_s_i_o_n_1 and
- _e_x_p_r_e_s_s_i_o_n_2 are equal.
-
- _e_x_p_r_e_s_s_i_o_n_1 != _e_x_p_r_e_s_s_i_o_n_2
- The relation shall be true if the values of _e_x_p_r_e_s_s_i_o_n_1 and
- _e_x_p_r_e_s_s_i_o_n_2 are unequal.
-
- There are only two storage classes in bc, global and automatic (local).
- Only identifiers that are to be local to a function need be declared with
- the auto command. The arguments to a function shall be local to the
- function. All other identifiers are assumed to be global and available
- to all functions. All identifiers, global and local, have initial values
- of zero. Identifiers declared as auto shall be allocated on entry to the
- function and released on returning from the function. They therefore do
- not retain values between function calls. Auto arrays shall be specified
- by the array name followed by empty square brackets. On entry to a
- function, the old values of the names that appear as parameters and as
- automatic variables are pushed onto a stack. Until return is made from
- the function, reference to these names refers only to the new values.
-
- References to any of these names from other functions that are called
- from this function also refer to the new value until one of those
- functions uses the same name for a local variable.
-
- When a statement is an expression, unless the main operator is an
- assignment, execution of the statement shall write the value of the
- expression followed by a <newline> character.
-
- When a statement is a string, execution of the statement shall write the
- value of the string.
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 4.3 bc - Arbitrary-precision arithmetic language 375
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- Statements separated by semicolon or <newline> shall be executed
- sequentially. In an interactive invocation of bc, each time a <newline>
- character is read that satisfies the grammatical production
-
- input_item : semicolon_list NEWLINE
-
- the sequential list of statements making up the semicolon_list shall be
- executed immediately and any output produced by that execution shall be
- written without any delay due to buffering.
-
- In an if statement [if (_r_e_l_a_t_i_o_n) _s_t_a_t_e_m_e_n_t] the _s_t_a_t_e_m_e_n_t shall be
- executed if the relation is true.
-
- The while statement [while (_r_e_l_a_t_i_o_n) _s_t_a_t_e_m_e_n_t] implements a loop in
- which the _r_e_l_a_t_i_o_n is tested; each time the _r_e_l_a_t_i_o_n is true, the
- _s_t_a_t_e_m_e_n_t shall be executed and the _r_e_l_a_t_i_o_n retested. When the _r_e_l_a_t_i_o_n
- is false, execution shall resume after _s_t_a_t_e_m_e_n_t.
-
- A for statement [for (_e_x_p_r_e_s_s_i_o_n; _r_e_l_a_t_i_o_n; _e_x_p_r_e_s_s_i_o_n) _s_t_a_t_e_m_e_n_t] shall
- be the same as:
-
- _f_i_r_s_t-_e_x_p_r_e_s_s_i_o_n
- while (_r_e_l_a_t_i_o_n) {
- _s_t_a_t_e_m_e_n_t
- _l_a_s_t-_e_x_p_r_e_s_s_i_o_n
- }
-
- All three expressions shall be present.
-
- The break statement causes termination of a for or while statement.
-
- The auto statement [auto _i_d_e_n_t_i_f_i_e_r[,_i_d_e_n_t_i_f_i_e_r] ...] _s_h_a_l_l _c_a_u_s_e _t_h_e
- _v_a_l_u_e_s _o_f _t_h_e _i_d_e_n_t_i_f_i_e_r_s _t_o _b_e _p_u_s_h_e_d _d_o_w_n. _T_h_e _i_d_e_n_t_i_f_i_e_r_s _c_a_n _b_e
- _o_r_d_i_n_a_r_y _i_d_e_n_t_i_f_i_e_r_s _o_r _a_r_r_a_y _i_d_e_n_t_i_f_i_e_r_s. _A_r_r_a_y _i_d_e_n_t_i_f_i_e_r_s _s_h_a_l_l _b_e
- _s_p_e_c_i_f_i_e_d _b_y _f_o_l_l_o_w_i_n_g _t_h_e _a_r_r_a_y _n_a_m_e _b_y _e_m_p_t_y _s_q_u_a_r_e _b_r_a_c_k_e_t_s. _T_h_e _a_u_t_o
- statement shall be the first statement in a function definition.
-
- A define statement:
-
- define _L_E_T_T_E_R ( _o_p_t__p_a_r_a_m_e_t_e_r__l_i_s_t ) {
- _o_p_t__a_u_t_o__d_e_f_i_n_e__l_i_s_t
- _s_t_a_t_e_m_e_n_t__l_i_s_t
- }
-
- defines a function named _L_E_T_T_E_R. If a function named _L_E_T_T_E_R was
- previously defined, the define statement shall replace the previous
- definition. The expression
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 376 4 Execution Environment Utilities
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- _L_E_T_T_E_R ( _o_p_t__a_r_g_u_m_e_n_t__l_i_s_t )
-
- shall invoke the function named _L_E_T_T_E_R. The behavior is undefined if the
- number of arguments in the invocation does not match the number of
- parameters in the definition. Functions shall be defined before they are
- invoked. A function shall be considered to be defined within its own
- body, so recursive calls shall be valid. The values of numeric constants
- within a function shall be interpreted in the base specified by the value
- of the ibase register when the function is invoked.
-
- The return statements [return and return(_e_x_p_r_e_s_s_i_o_n)] shall cause
- termination of a function, popping of its auto variables, and specifies
- the result of the function. The first form shall be equivalent to
- return(0). The value and scale of an invocation of the function shall be
- the value and scale of the expression in parentheses.
-
- The quit statement (quit) _s_h_a_l_l _s_t_o_p _e_x_e_c_u_t_i_o_n _o_f _a _b_c program at the
- point where the statement occurs in the input, even if it occurs in a
- function definition, or in an if, for, or while statement.
-
- The following functions shall be defined when the -l option is specified:
-
- s ( _E_x_p_r_e_s_s_i_o_n ) Sine of argument in radians
-
- c ( _E_x_p_r_e_s_s_i_o_n ) _C_o_s_i_n_e _o_f _a_r_g_u_m_e_n_t _i_n _r_a_d_i_a_n_s
-
- _a ( _E_x_p_r_e_s_s_i_o_n ) _A_r_c_t_a_n_g_e_n_t _o_f _a_r_g_u_m_e_n_t
-
- _l ( _E_x_p_r_e_s_s_i_o_n ) _N_a_t_u_r_a_l _l_o_g_a_r_i_t_h_m _o_f _a_r_g_u_m_e_n_t
-
- _e ( _E_x_p_r_e_s_s_i_o_n ) _E_x_p_o_n_e_n_t_i_a_l _f_u_n_c_t_i_o_n _o_f _a_r_g_u_m_e_n_t
-
- _j ( _E_x_p_r_e_s_s_i_o_n , _E_x_p_r_e_s_s_i_o_n )
- _B_e_s_s_e_l _f_u_n_c_t_i_o_n _o_f _i_n_t_e_g_e_r _o_r_d_e_r
-
- _T_h_e _s_c_a_l_e _o_f _a_n _i_n_v_o_c_a_t_i_o_n _o_f _e_a_c_h _o_f _t_h_e_s_e _f_u_n_c_t_i_o_n_s _s_h_a_l_l _b_e _t_h_e _v_a_l_u_e
- _o_f _t_h_e _s_c_a_l_e register when the function is invoked. The behavior is
- undefined if any of these functions is invoked with an argument outside
- the domain of the mathematical function.
-
-
- 4.3.8 Exit Status
-
- The bc utility shall exit with one of the following values:
-
- 0 All input files were processed successfully.
-
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 4.3 bc - Arbitrary-precision arithmetic language 377
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- _u_n_s_p_e_c_i_f_i_e_d An error occurred.
-
-
- 4.3.9 Consequences of Errors
-
- If any _f_i_l_e operand is specified and the named file cannot be accessed,
- bc shall write a diagnostic message to standard error and terminate
- without any further action.
-
- In an interactive invocation of bc, the utility should print an error
- message and recover following any error in the input. In a
- noninteractive invocation of bc, invalid input causes undefined behavior.
-
- BEGIN_RATIONALE
-
-
- 4.3.10 Rationale. (_T_h_i_s _s_u_b_c_l_a_u_s_e _i_s _n_o_t _a _p_a_r_t _o_f _P_1_0_0_3._2)
-
- _E_x_a_m_p_l_e_s_,__U_s_a_g_e
-
- This description is based on _B_C--_A_n _A_r_b_i_t_r_a_r_y _P_r_e_c_i_s_i_o_n _D_e_s_k-_C_a_l_c_u_l_a_t_o_r
- _L_a_n_g_u_a_g_e by Lorinda Cherry and Robert Morris, in the BSD User Manual
- {B28}.
-
- Automatic variables in bc do not work in exactly the same way as in
- either C or PL/1.
-
- In the shell, the following assigns an approximation of the first ten
- digits of J to the variable _x:
-
- x=$(printf "%s\n" 'scale = 10; 104348/33215' | bc)
-
- The following bc program prints the same approximation of J, with a
- label, to standard output:
-
- scale = 10
- "pi equals "
- 104348 / 33215
-
- The following defines a function to compute an approximate value of the
- exponential function (note that such a function is predefined if the -l
- option is specified):
-
-
-
-
-
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 378 4 Execution Environment Utilities
-
-
-
-
-
- Part 2: SHELL AND UTILITIES P1003.2/D11.2
-
- scale = 20
- define e(x){
- auto a, b, c, i, s
- a = 1
- b = 1
- s = 1
- for (i = 1; 1 == 1; i++){
- a = a*x
- b = b*i
- c = a/b
- if (c == 0) {
- return(s)
- }
- s = s+c
- }
- }
-
- The following prints approximate values of the exponential function of
- the first ten integers:
-
- for (i = 1; i <= 10; ++i) {
- e(i)
- }
-
- _H_i_s_t_o_r_y__o_f__D_e_c_i_s_i_o_n_s__M_a_d_e
-
- The bc utility is traditionally implemented as a front-end processor for
- dc; dc was not selected to be part of the standard because bc was thought
- to have a more intuitive programmatic interface. Current implementations
- that implement bc using dc are expected to be compliant.
-
- The Exit Status for error conditions been left unspecified for several
- reasons:
-
- (1) The bc utility is used in both interactive and noninteractive
- situations. Different exit codes may be appropriate for the two
- uses.
-
- (2) It is unclear when a nonzero exit should be given; divide-by-
- zero, undefined functions, and syntax errors are all
- possibilities.
-
- (3) It is not clear what utility the exit status has.
-
- (4) In the 4.3BSD, System V, and Ninth Edition implementations, bc
- works in conjunction with dc. dc is the parent, bc is the
- child. This was done to cleanly terminate bc if dc aborted.
-
-
-
-
- Copyright c 1991 IEEE. All rights reserved.
- This is an unapproved IEEE Standards Draft, subject to change.
-
-
-
-
-
- 4.3 bc - Arbitrary-precision arithmetic language 379
-
-
-
-
-
- P1003.2/D11.2 INFORMATION TECHNOLOGY--POSIX
-
- The decision to have bc exit upon encountering an inaccessible input file
- is based on the belief that bc _f_i_l_e_1 _f_i_l_e_2 is used most often when at
- least _f_i_l_e_1 contains data/function declarations/initializations. Having
- bc continue with prerequisite files missing is probably not useful.
- There is no implication in the Consequences of Errors subclause that bc
- must check all its files for accessibility before opening any of them.
-
- There was considerable debate on the appropriateness of the language
- accepted by bc. Several members of the balloting group preferred to see
- either a pure subset of the C language or some changes to make the
- language more compatible with C. While the bc language has some obvious
- similarities to C, it has never claimed to be compatible with any version
- of C. An interpreter for a subset of C might be a very worthwhile
- utility, and it could potentially make bc obsolete. However, no such
- utility is known in existing practice, and it was not within the scope of
- POSIX.2 to define such a language and utility. If and when they are
- defined, it may be appropriate to include them in a future revision of
- this standard. This left the following alternatives:
-
- (1) Exclude any calculator language from the standard.
-
- The consensus of the working group was that a simple
- programmatic calculator language is very useful. Also, an
- interactive version of such a calculator would be very important
- for the POSIX.2a revision. The only arguments for excluding any
- calculator were that it would become obsolete if and when a C-
- compatible one emerged, or that the absence would encourage the
- development of such a C-compatible one. These arguments did not
- suff